AWS hosts a variety of public datasets that anyone can access for free.

Previously, large datasets such as satellite imagery or genomic data have required hours or days to locate, download, customize, and analyze. When data is made publicly available on AWS, anyone can analyze any volume of data without needing to download or store it themselves. These datasets can be analyzed using AWS compute and data analytics products, including Amazon EC2, Amazon Athena, AWS Lambda and Amazon EMR.

Learn more about working with geospatial data on AWS at Earth on AWS.

  • Landsat on AWS: An ongoing collection of satellite imagery of all land on Earth produced by the Landsat 8 satellite.
  • Sentinel-2 on AWS: An ongoing collection of satellite imagery of all land on Earth produced by the Sentinel-2 satellite.
  • GOES on AWS: GOES provides continuous weather imagery and monitoring of meteorological and space environment data across North America.
  • SpaceNet on AWS: A corpus of commercial satellite imagery and labeled training data to foster innovation in the development of computer vision algorithms.
  • OpenStreetMap on AWS: OSM is a free, editable map of the world, created and maintained by volunteers. Regular OSM data archives are made available in Amazon S3.
  • MODIS on AWS: Select products from the Moderate Resolution Imaging Spectroradiometer (MODIS) managed by the U.S. Geological Survey and NASA.
  • Terrain Tiles: A global dataset providing bare-earth terrain heights, tiled for easy usage and provided on S3.
  • NAIP: 1 meter aerial imagery captured during the agricultural growing seasons in the continental U.S.
  • NEXRAD on AWS: Real-time and archival data from the Next Generation Weather Radar (NEXRAD) network.
  • NASA NEX: A collection of Earth science datasets maintained by NASA, including climate change projections and satellite images of the Earth's surface.
  • District of Columbia LiDAR: LiDAR point cloud data for Washington, DC.
  • EPA Risk-Screening Environmental Indicators: detailed air model results from EPA’s Risk-Screening Environmental Indicators (RSEI) model.
  • HIRLAM Weather Model: HIRLAM (High Resolution Limited Area Model) is an operational synoptic and mesoscale weather prediction model managed by the Finnish Meteorological Institute.

Learn more about genomics in the cloud.

  • 1000 Genomes Project: A detailed map of human genetic variation.
  • TCGA on AWS: Raw and processed genomic, transcriptomic, and epigenomic data from The Cancer Genome Atlas (TCGA) available to qualified researchers via the Cancer Genomics Cloud.
  • ICGC on AWS: Whole genome sequence data available to qualified researchers via The International Cancer Genome Consortium (ICGC).
  • 3000 Rice Genome on AWS: Genome sequence of 3,024 rice varieties.
  • Genome in a Bottle (GIAB): Several reference genomes to enable translation of whole human genome sequencing to clinical practice.

Learn more about artificial intelligence and machine learning on AWS.

  • Common Crawl: A corpus of web crawl data composed of over 5 billion web pages.
  • Amazon Bin Image Dataset: Over 500,000 bin JPEG images and corresponding JSON metadata files describing products in an operating Amazon Fulfillment Center.
  • GDELT: Over a quarter-billion records monitoring the world's broadcast, print, and web news from nearly every corner of every country, updated daily.
  • Multimedia Commons: A collection of nearly 100M images and videos with audio and visual features and annotations.
  • Google Books Ngrams: A dataset containing Google Books n-gram corpuses.
  • SpaceNet on AWS: A corpus of commercial satellite imagery and labeled training data to foster innovation in the development of computer vision algorithms.
  • IRS 990 Filings on AWS: Machine-readable data from certain electronic 990 forms filed with the IRS from 2011 to present
  • ACS PUMS on AWS: U.S. Census American Community Survey (ACS) Public Use Microdata Sample (PUMS) is available in a linked data format using the Resource Description Framework (RDF) data model
  • USAspending.gov on AWS: USAspending.gov database, which includes data on all spending by the federal government, including contracts, grants, loans, employee salaries, and more.