AWS Storage Blog

How SkyWatch built its satellite imagery solution using AWS Lambda and Amazon EFS

SkyWatch is on a mission to democratize remote sensing data through a simple user experience. Every day, trillions of pixels of Earth observation imagery are captured by satellites orbiting our planet. New applications for this data are developed every week, with demand increasing across many industries. Examples include commercial applications, such as construction, finance, and infrastructure monitoring, and applications developed for humanitarian causes, such as coordinating disaster assessment and relief, and many more.

SkyWatch’s infrastructure for both its SkyWatch TerraStream and SkyWatch EarthCache earth observation imagery platforms utilize services from AWS. At SkyWatch, we have built an efficient and scalable processing system for satellite data using Amazon Web Services (AWS) at the core of our infrastructure. In this post, we cover two examples of the way we at SkyWatch use Amazon Elastic File System (Amazon EFS): to increase the speed our AWS CodeBuild times, and to gain petabyte-scale storage for our AWS Lambda functions.

Amazon EFS for faster AWS CodeBuild times

All of our builds, testing, and releases are automated using AWS CodePipeline. For our Java-based services specifically, our Maven dependencies are stored in AWS CodeArtifact. Before using Amazon EFS in our solution, every AWS CodeBuild instance retrieved its Maven dependencies from AWS CodeArtifact when a build was initiated, leading to longer than desired build times. Build times were in excess of six minutes, and when we started using Amazon EFS they were reduced to just under three minutes.

How SkyWatch uses Amazon EFS to provide a persistent storage solution for repository dependencies to improve build times with AWS CodeBuild.

How SkyWatch uses Amazon EFS to provide a persistent storage solution for repository dependencies to improve build times with AWS CodeBuild.

We needed fast, persistent storage in which we could save local Maven repositories. We didn’t want file versions that were unchanged since the last code build — and those already downloaded — to be downloaded again.

The solution we built is helping SkyWatch to process satellite images faster. Using Amazon EFS enables us to configure AWS CodeBuild to utilize persistent storage, reducing our build times by 50%. Amazon EFS also enables SkyWatch to configure our solution environment to the throughput level we need during builds. Amazon EFS ensures there is sufficient bandwidth, even as the dependencies for a particular project grow.

Amazon EFS for petabyte-scale AWS Lambda storage

Additionally, we use Amazon EFS to extend the storage potential within AWS Lambda functions, where the available storage in the process /tmp is ordinarily limited to 512 MB.

The Earth observation processing system at SkyWatch relies heavily on AWS Lambda to process large, multi-band, sub-meter resolution satellite imagery products quickly and cost- effectively.

These data products are typically stored as cloud-optimized GeoTIFF files, enabling the data to be clipped and streamed as a series of tiles without requiring the entire file to be downloaded into a single AWS Lambda function. However, occasionally there is still a need to download the entire product. The product typically consists of imagery data across four spectral imaging bands (red, green, blue, and near-infrared), each one of which might be 200+ MB in size. The 800+ MB full product is thus far in excess of the 512 MB temporary storage provided by the AWS Lambda /tmp directory.

Amazon EFS provides petabyte-scale storage capacity to our AWS Lambda functions to our solution, and enables state to be shared between them.

Amazon EFS provides petabyte-scale storage capacity to our AWS Lambda functions to our solution, and enables state to be shared between them.

When data exceeds the AWS Lambda /tmp directory, we mount Amazon EFS to provide petabyte-scale storage capacity to AWS Lambda functions. Further, because Amazon EFS storage is persistent, it enables data to be shared by the AWS Lambda functions in our processing system. Shared file storage provided by Amazon EFS enables us to share state or continue the processing of a data product where another processing step left off.

Conclusion

AWS lies at the core of SkyWatch’s infrastructure for both its SkyWatch TerraStream and SkyWatch EarthCache Earth Observation imagery platforms. AWS services enable us to leverage a wide variety of tools together, such as Amazon EFS with AWS CodeBuild and AWS Lambda. The solution we’ve built uses services from AWS that seamlessly enhance and complement one another. Our satellite imagery solution built on AWS enables us to deliver cost and processing efficiencies to our customers, while supplying the data they need. We look forward to continuing the partnership between SkyWatch and AWS, and building for the future together.

Thanks for reading this blog post. If you have any questions or feedback about what’s covered here, please leave a comment and the AWS team will make sure it gets back to us!

Luis Veci

Luis Veci

Luis Veci is the Image Processing Team’s Technical Lead at SkyWatch. He has more than 15-years of experience leading software development projects in Earth Observation for ESA, CSA, DND and others, having led the development of the NEST, RADARSAT-2, and Sentinel-1 Toolboxes.

Alexander De Souza

Alexander De Souza

Alexander De Souza is the Image Processing & Machine Learning Team Lead at SkyWatch. He holds a PhD in Computational Astrophysics from the University of Western Ontario and has a decade of experience developing machine learning solutions for start-ups and large corporations from San Francisco to Amsterdam.