synthetic_data_mimicking_CA_and_AZ.zip (296.2 MB)

Synthetic Overhead Images of Wind Turbines Made to Mimic California and Arizona

dataset

posted on 2020-07-31, 16:20 authored by Duke Dataplus2020Duke Dataplus2020

Overview

This is a set of synthetic overhead imagery of wind turbines that was created with CityEngine, and was made to qualitatively match overhead images of wind turbines from California and Arizona. There are corresponding labels that provide the class, x and y coordinates, and height and width (YOLOv3 format) of the ground truth bounding boxes for each wind turbine in the images. These labels are named similarly to the images (e.g. wnd_xview_bkg_sd0_1.png will have the label titled wnd_xview_bkg_sd0_1.txt). The images are contained in syn_CA_AZ_xview_bkg_shdw_scatter_uniform_50_wnd_v1 and the labels are contained in syn_CA_AZ_xview_bkg_shdw_scatter_uniform_50_wnd_v1.

Use

This dataset is meant as supplementation to training an object detection model on overhead images of wind turbines. This data can be added to the training set to potentially improve the performance of the model, especially if the model is being tested on small wind turbines or in desert regions.

Why

This dataset was created to study the use of synthetic imagery when doing cross-domain testing (training on one geographic region and then testing on a much different geographic region). When training the YOLOv3 model on the Overhead Imagery of Wind Turbines dataset, we noticed qualitatively that the model struggled much more on the images from California and Arizona. This is because there are small wind turbines present in these states. Because of the size of these small turbines, there is less information for the model to identify them, and oftentimes the only human-noticeable information is the shadows of the turbines. This dataset was then designed to try to improve the model's performance on these regions and type of turbines. In our experiment, the baseline training (real overhead images in the Overhead Imagery of Wind Turbines dataset) set contained all of the images not from California and Arizona. The testing set contained all of the images from California and Arizona. This model was trained and evaluated, and then this synthetic imagery was added to the training set and the performance was once again evaluated on the testing set of just data from California and Arizona.

Method

The process for creating the dataset involved selecting background images from https://figshare.com/articles/dataset/Power_Plant_Satellite_Imagery_Dataset/5307364 that were not contained in the Overhead Imagery of Wind Turbines dataset, and did not have much infrastructure that would make the scene seem unrealistic. Then, a script was used to select these at random and uniformly generate 3D models of both small and large wind turbines over the image and then position the virtual camera to save four 608x608 pixel images. This process was repeated with the same random seed, but with no background image and the wind turbines colored as black. Next, these black and white images were converted into ground truth labels by grouping the black pixels in the images.