Aerial imagery object identification dataset for building and road detection, and building height estimation
Automated object detection in high-resolution aerial imagery can provide valuable information in fields ranging from urban planning and operations to economic research, however, automating the process of analyzing aerial imagery requires training data for machine learning algorithm development. This dataset seeks to meet that need. For 25 locations across 9 U.S. cities, this dataset provides (1) high resolution aerial imagery; (2) annotations of over 40,000 building footprints (OSM shapefiles) as well as road polylines; and (3) topographical height data (LIDAR). This dataset can be used as ground truth to train computer vision and machine learning algorithms for object identification and analysis, in particular for building detection and height estimation, as well as road detection.
A complete description of the data can be found in the pdf file within the metadata for this collection, titled, “Metadata - building (area and height) and road dataset”. This data collection is organized such that there is a separate dataset in this collection for each of the 9 cities with available data (Arlington, MA; Atlanta, GA; Austin TX; Washington, DC; New Haven, CT; New York City, NY; Norfolk, VA; San Francisco, CA; Seekonk, MA). A map of these cities and example images can be found here: http://arcg.is/2afcSOk.
Imagery data from the United States Geological Survey (USGS); building and road shapefiles are from OpenStreetMaps (OSM) (these OSM data are made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/); and the Lidar data are from U.S. National Oceanic and Atmospheric Administration (NOAA), the Texas Natural Resources Information System (TNRIS).