SpaceNet is a corpus of commercial satellite imagery and labeled training data being made available at no cost to the public to foster innovation in the development of computer vision algorithms to automatically extract information from remote sensing data.

The current SpaceNet corpus includes approximately 1,900 square kilometers full-resolution 50 cm imagery collected from DigitalGlobe’s WorldView-2 commercial satellite and includes 8-band multispectral data. The dataset also includes 220,594 building footprints derived from this imagery which can be used as training data for machine learning. This dataset is being made public to advance the development of algorithms to automatically extract geometric features such as roads, building footprints, and points of interest using satellite imagery. The first Area of Interest (AOI) to be released is of Rio De Janeiro, Brazil.

The satellite imagery, along with training data, is provided via a collaboration between DigitalGlobe, CosmiQ Works, and NVIDIA.

Note that this is an initial release of the data. More areas of interest to be added quarterly.

The SpaceNet dataset is being released in several Areas of Interest. All AOIs will follow a similar directory structure and data format. The imagery is GeoTIFF satellite imagery and corresponding GeoJSON building footprints. You can use the following aws-cli command to examine all files available in the dataset (details of file structure below):  

aws s3 ls spacenet-dataset --request-payer requester

For more detailed information on how to access specific files within the dataset, see here.

The spacenet-dataset S3 bucket is provided as a Requester Pays bucket, see here for more information.

Each AOI contains two logical directories, srcData and processedData. The scrData directory contains the "raw" representation of the raster data (i.e. full imagery mosaic) and vector data (i.e. entire AOI building footprints). The processedData directory contains data that is formatted to be consumed by machine learning algorithms (i.e chipped in 200 by 200 meter squares). Further details on format and directory structure can be found below.

Inside srcData there are three separate folders:

  • buildingLabels: This folder contains in GeoJSON format the AOI of labeled information and the building labels for the imagery. The GeoJSONs are in the EPSG:4326 projection.
  • mosaic_3band: This folder contains 20 pan-sharpened 3-band (Red, Green and Blue) GeoTIFF files that combine to make a mosaic of Rio de Janeiro. They have a resolution of ~0.5m.
  • mosaic_8band: This folder contains 20 8-band (Coastal, Blue, Green, Yellow, Red, Red Edge, Near-IR1 and Near-IR2) GeoTIFF files that combine to make a mosaic of Rio de Janeiro. They have a resolution of ~1.9m.
  • vectorData: This folder contains point of interest (POI) data and source building footprints over the area of interest in both Esri GeoDatabase and GeoJSON (with JPEG) formats.  

Inside processedData:

  • 3band.tar.gz: This compressed tar archive contains 7186 3-band GeoTIFF files. These files were created by cutting the the srcData above into 200m x 200m images.
  • 8band.tar.gz: This compressed tar archive contains 7186 8-band GeoTIFF files. These files were created by cutting the the srcData above into 200m x 200m images.
  • geoJson.tar.gz: This compressed tar archive contains 7186 GeoJSON files that contain the building labels for each GeoTIFF file in EPSG:4326.
  • summaryData:This folder contains a CSV and GeoJSON file for each of the 3-band and 8-band datasets. Details of these file formats below.
ImageId This field links the building with its associated 200m x 200m clip. CSV, GeoJSON
BuildingId
This field differentiates between buildings within a clip. The index starts at 0. If a chip does not have any buildings the BuildingId is set to -1. CSV, GeoJSON
PolygonWKT_Pix This field describes the Geometry as a Polygon in in WKT format. The coordinates given are in pixel coordinate space with regard to the clip. If a chip does not have any buildings this field contains “POLYGON ((0 0, 0 0, 0 0, 0 0))”. CSV
PolygonWKT_Geo This field describes the Geometry as a polygon in WKT format. The coordinates are given in EPSG: 4326. If a chip does not have any buildings this field contains “POLYGON ((0 0, 0 0, 0 0, 0 0))”. CSV
Geometry This field describes the Geometry as a Polygon in WKT format. The coordinates given are in pixel coordinate space with regard to the clip. GeoJSON
spacenet-logo2
Source
DigitalGlobe, Inc.
Category Computer Vision, Geospatial
Format GeoTIFF, GeoJSON
License Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
Storage Service Amazon S3
Location s3://spacenet-dataset in us-east-1
Update Frequency New imagery and features are added quarterly

NVIDIA demonstrates how DIGITS, their deep learning GPU training system, can be used to train two different types of convolutional neural networks for detecting buildings in the SpaceNet 3-band imagery.

See how NVIDIA is using SpaceNet data here.

Development Seed provides scripts for setting up the SpaceNet dataset for training a SegNet model via their open source package.

View the code to see how to get started using SpaceNet data here.

CosmiQ Works developed scripts to preprocess satellite imagery for consumption in machine learning frameworks and evaluation code to measure the effectiveness of object detection results.

Sample code and data can be found here.