AWS Compute Blog
Building a Serverless Interface for Global Satellite Imagery
Update (February 19, 2021):
The URL referenced in this article is no longer maintained by AWS, but you can still find the open source code used in the project at https://github.com/awslabs/landsat-on-aws.
This is a guest post by Joe Flasher, Technical Business Development Manager.
In March 2015, we launched Landsat on AWS, a public dataset made up of imagery from the Landsat 8 satellite. Within the first year of launching Landsat on AWS, we logged over 1 billion requests for Landsat data and have been inspired by our customers innovative uses of the data. The dataset is large, consisting of hundreds of TB spread over millions of objects and is growing by roughly 1.4 TB per day.
It’s also a very beautiful dataset, as the Landsat 8 satellite regularly captures striking images of our planet. This is why we created landsatonaws.com, a website that makes it easy to browse the data in your browser.
With so many files, it can be challenging to browse through them in a human-friendly manner or to visually explore the full breadth of the imagery. We wanted to create an interface that would let users walk through the imagery, but would also act as a referential catalog, providing data about the imagery following a well-structured URL format. In addition, we wanted the interface to be:
- Fast. Pages should load quickly.
- Lightweight. Hosting 100,000,000 pages should cost less than $100 per month.
- Indexable. All path/row pages of the site should be indexable by search engines.
- Linkable. All unique pages of the site should have a well-structured URL.
- Up to date. New imagery is added daily, to make sure the site stays current.
To meet all these needs, we employed a serverless architecture that brought together AWS Lambda, Amazon API Gateway and Amazon S3 to power https://landsatonaws.com.
With API Gateway and Lambda, we have a powerful pairing that can handle incoming HTTP requests, run a Lambda function, and return the response to the requester. We also have an S3 bucket where we store a small amount of aggregated metadata that helps the Lambda functions respond quickly to requests.
There is a two-fold advantage to this sort of architecture. Firstly, we have no server to worry about and only pay when a user makes a request. Secondly, while the underlying imagery dataset is very large, the data needed to power the front-end interface is very small; pages are created when requested and then disappear.
Landsat 8 imagery has a well-defined naming structure and we wanted to be able to reproduce that with our page structure. To accomplish that, we provide endpoints for every combination down to the unique scene, for example:
- /L8 – All Landsat 8 imagery
- /L8/004 – All Landsat 8 imagery in path 4
- /L8/004/112 – All Landsat 8 imagery in path 4 and row 112
- /L8/004/112/LC80041122015344LGN00 – The unique page for the LC80041122015344LGN00 scene
To be able to provide fast responses to these page requests via Lambda functions, we do a small amount of upfront processing on a nightly basis. This is done via another Lambda function that’s triggered with a scheduled CloudWatch event. This function creates a few metadata files, including a list of unique path/row combos for all imagery, a sitemap.txt file for search engine indexing, a list of the last four cloud-free images (for the homepage), and a number of files with scene info, aggregated at the path level. These aggregated files look like the following:
LC80010022016230LGN00,2016-08-17 14:07:40.521957,84.04
LC80010022016246LGN00,2016-09-02 14:07:46.993260,86.66
LC80010022016262LGN00,2016-09-18 14:07:49.546837,20.02
LC80010032015115LGN00,2015-04-25 14:07:18.078839,1.45
LC80010032015131LGN00,2015-05-11 14:07:02.141468,32.71
LC80010032015147LGN00,2015-05-27 14:07:03.269496,87.21
LC80010032015163LGN00,2015-06-12 14:07:14.976409,10.72
LC80010032015179LGN00,2015-06-28 14:07:20.976541,70.1
LC80010032015195LGN00,2015-07-14 14:07:31.606418,23.26
LC80010032015211LGN00,2015-07-30 14:07:36.060404,78.89
The file names include scene ID, acquisition time, and cloud cover percentage.
All of the metadata files are created by accessing the scene_list.gz
file from the dataset’s S3 bucket and cutting it up in a few different ways.
For a unique scene request such as https://landsatonaws.com/L8/004/112/LC80041122015344LGN00, the Lambda function calls out to S3 for two different pieces of information. The first is a request to get all the keys that have a prefix with the given scene ID so that we can display all related files and the second is a request for the scene’s metadata so we can provide detailed information. Because our Lambda function is running in the same region as the Landsat data (though it could run anywhere), these requests are handled very quickly.
Summary
By using Lambda, API Gateway, and S3, we’re able to create a fast, lightweight, indexable, and always up to date front end for a very large dataset. You can find all the code used to power https://landsatonaws.com at landsat-on-aws on GitHub. While the code is written specifically for the Landsat on AWS data, this same pattern can be used for any well-defined dataset.