figshare
Browse
Notre_Dame_dataset.zip (169.38 MB)

Notre-Dame Cathedral Fire dataset

Download (169.38 MB)
Version 2 2020-02-04, 00:50
Version 1 2020-02-02, 15:12
dataset
posted on 2020-02-04, 00:50 authored by Rafael PadilhaRafael Padilha, Fernanda AndaloFernanda Andalo, Luís Pereira, Bahram Lavi, Anderson Rocha
Notre-Dame Cathedral Fire Dataset
# of images: 1,657 images during or after the fire

If you use the dataset, please cite the following works:

Padilha, Rafael and Andaló, Fernanda A. and Pereira, Luís A. M. and Rocha, Anderson. "Unraveling the Notre Dame Cathedral fire in space and time: an X-coherence approach,” in Crime Science and Digital Forensics: A holistic view. CRC Press by Taylor and Francis Group.

Padilha, Rafael and Andaló, Fernanda A. and Rocha, Anderson. “Improving the chronological sorting of images through occlusion: A study on the Notre-Dame cathedral fire,” in 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.




Description of the event and data collection:
On April 15th, 2019, large parts of Notre-Dame Cathedral's structure and spire were devastated by a fire. People worldwide followed the tragic event through images and videos that were shared by the media and citizens.

From the generated imagery, we collected a total of 23,683 images posted on Twitter during and on the day after the fire. Even though most of them were related to the event, several were memes, cartoons, compositions and artwork, while some depicted the cathedral before the fire. As we focus on learning how the fire and appearance of the cathedral evolved during the event, we removed them, reducing our set to 5,206 relevant images. Among these, several examples were duplicates or near-duplicates of other images. Considering their little contribution to the training process, after their removal, we were left with 1,657 distinct images related to the event.
The cleaning process involved using methods such as local sensitive hashing for filtering near-duplicates, and semi-supervised approaches based on Optimum-path Forest theory to mine for relevant and non-relevant imagery of the event.
By analyzing the event's description, four main sub-events can be defined: spire on fire, spire collapsing, fire continues on roof, and fire extinguished. Each sub-event contains specific visual clues (e.g., the absence of the central spire) that can be leveraged to estimate the temporal position of an image. Each image in the dataset was manually labeled as being captured in one of these sub-events. We also consider an unknown category for images that do not contain any hint of the sub-event in which they were captured, such as zoom-ins of the cathedral's facades.

Besides that, each image was annotated with respect to the intercardinal direction of the cathedral’s facade being depicted in the image (north, northeast, east, southeast, south, southwest, west, northwest).

Funding

DéjàVu thematic project, São Paulo Research Foundation (grants 2017/12646-3, 2017/21957-2 and 2018/16548-9)

History