Automatically convert satellite imagery to Cloud-Optimized GeoTIFFs for hosting in Amazon S3

The sheer size of satellite imagery has long levied a heavy burden on client software seeking to render these images dynamically. Typically, customers must duplicate imagery from a central repository to their local computers. Downloading large images can take several minutes. When images are downloaded, processing speeds are also slow. The introduction of Cloud Optimized GeoTiffs (COG) has increased the efficiency of using the cloud-based storage of these images.

In this post, we demonstrate how to use serverless technologies to process and store satellite imagery. Specifically, the images are converted to efficient COGs and stored in Amazon Simple Storage Service (S3), an object storage service offering industry-leading scalability, data availability, security, and performance. By using COGs, users can access large imagery immediately and no longer have to duplicate data locally. As a result, users can start analyzing imagery and outputting products faster. Users can also stream parts of the image they need instead of using the whole image, leading to faster processing speeds and increased productivity.

Cloud Optimized GeoTIFF (COG)

A COG is a regular GeoTIFF but with an internal organization that enables more efficient workflows on the cloud. Users can stream only the portions of the COG that they need, resulting in faster data transfer and processing speeds. Additionally, COGs reduce data duplication. Users can access COG data without needing to copy and cache the data locally. The serverless architecture described in the following sections allows organizations to automatically convert their data to COGs.

Solution overview

To accomplish this, we cover the following steps:

How to build a containerized AWS Lambda function with the Python library rio-cogeo preinstalled that handles the translation of imagery into a COG. AWS Lambda will be able to translate imagery <10GB in size.
The creation of the cloud infrastructure that handles automatically processing new images, converting them, and storing the outputs. The S3 bucket triggers the translation Lambda function when the S3 bucket receives an upload of satellite imagery.
An example of how to connect a client application to these newly created COG files in Amazon S3.

Prerequisites

The following prerequisites are required before continuing with the steps below:

IAM permissions to create Amazon S3 buckets
AWS Lambda permissions to read and write to S3 buckets
IAM permissions to create ECR repositories and push Docker images to it
Docker installed
AWS Command Line Interface installed
QGIS software installed

Walkthrough

At a high level, you’ll perform the following steps:

Create Amazon S3 input/output buckets.
Create AWS Lambda container image and push to ECR.
Deploy AWS Lambda function using the ECR image and setup S3 trigger.
Use COGs with QGIS software.

1. Creating Amazon S3 buckets

Most GeoTIFFs are not COGs already. To conveniently convert these images to COGs, this post deploys two S3 buckets and a Lambda function. Ultimately, users can upload their imagery to an input S3 bucket which triggers the Lambda function to convert the image to a COG and upload the result to an output S3 bucket.

Start by creating the input S3 bucket using the AWS Management Console. Provide a name for the bucket and create it using the default settings. This post refers to the input S3 bucket as the “geotiff-to-cog” bucket. Additionally, create a bucket with the same name as the input bucket but with the text ‘-output’ appended to the end. The output bucket for this post is referred to as the ‘geotiff-to-cog-output’ bucket.

2. Creating the Lambda Container Image

Next, we describe how to build the Lambda function responsible for converting input imagery to COGs. Create the handler.py, Dockerfile, and requirements.txt as follows and store in a directory such as /lambda.

# handler.py 
import json
import boto3
import os
import rio_cogeo
from rio_cogeo.cogeo import cog_translate
from rio_cogeo.profiles import cog_profiles

def noncog_to_cog_tiff(input_img, output_img):
    if rio_cogeo.cog_info(input_img).COG:
        print('The input img is already a COG!')
    else:
        print('The input img is not a COG, starting conversion of input img to COG!')
        cog_translate(input_img, output_img, cog_profiles.get("lzw"))
        if rio_cogeo.cog_info(output_img).COG:
            print(f'finished converting input img to COG! The output img is saved to {output_img}')

def handler(event, context):
    print('start')
    os.mkdir('/tmp/output')
    
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    obj_name = key.split('/')[-1]
    full_path = f's3://{bucket_name}/{key}'
    
    s3 = boto3.resource('s3')
    s3.Bucket(bucket_name).download_file(key, f'/tmp/{obj_name}')
    print(obj_name)
    
    noncog_to_cog_tiff(f'/tmp/{obj_name}', f'/tmp/output/{obj_name}')
    
    s3 = boto3.client('s3')
    s3.upload_file(f'/tmp/output/{obj_name}', bucket_name+'-output', 'cog_'+key)
    
    return {
        'statusCode': 200,
        'body': json.dumps(f'conversion_to_cog_finished')
    }

# Dockerfile 
FROM public.ecr.aws/lambda/python:3.9
COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
COPY handler.py ${LAMBDA_TASK_ROOT}
CMD ["handler.handler"]

# requirements.txt
awslambdaric
boto3
rio-cogeo

In a terminal, navigate to the directory and execute docker build -t lambda-cog-blog:latest. After the docker image has been built, you can test it locally using the command docker run -p 9000:8080 lambda-cog-blog:latest.

Next, the docker image is pushed to Amazon Elastic Container Registry (Amazon ECR). Amazon ECR is a fully managed container registry offering high-performance hosting so you can reliably deploy application images and artifacts anywhere. Start by tagging the docker image and logging in the Amazon ECR repository. Then, the docker image is pushed to the Amazon ECR repository. The commands to complete the Amazon ECR transfer are as follows:

docker tag blog-test:latest <AWS_ACCOUNT_NUMBER>.dkr.ecr.us-east-1.amazonaws.com/lambda-cog-blog:latest

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <AWS_ACCOUNT_NUMBER>.dkr.ecr.us-east-1.amazonaws.com

aws ecr create-repository --repository-name lambda-cog-blog --image-scanning-configuration scanOnPush=true --image-tag-mutability MUTABLE

docker push <AWS_ACCOUNT_NUMBER>.dkr.ecr.us-east-1.amazonaws.com/lambda-cog-blog:latest

3. Deploying Lambda from the Amazon ECR Image

Finally, the Lambda function can be deployed using the image stored in Amazon ECR. Open Lambda within the Console. Select Create function and choose the Container image category. Select Browse images and select the Lambda image that was pushed to Amazon ECR. Give the Lambda function a name and select Create function.

how- to create aws lambda function

After the function has been created, the Amazon S3 trigger must be setup. Start by selecting Add trigger and use Amazon S3 as the source. Choose your bucket as the event source and select Add at the bottom. Additionally, open the configuration tab and give Amazon S3 read and write permissions to the function’s role under Permissions. Depending on how large your input images are likely to be, you can also increase the RAM (10GB limit), disk (10GB limit), and timeout (15min limit) of the Lambda function under “General Configuration”.

how to create amazon S3 trigger

This completes the necessary architecture. Users can input imagery to the input S3 bucket, here called the ‘geotiff-to-cog’ bucket. This triggers the Lambda function that downloads the image and converts it to a COG. The Lambda function inputs the COG output to the output S3 bucket, here called ‘geotiff-to-cog-output’. If users want a non-COG dataset of TIF images to test converting images to COGs using the architecture above, then a dataset is available here called RarePlanes.

4. Access COGs with QGIS software

The images in the ‘geotiff-to-cog-output’ S3 bucket are ready to be accessed efficiently using geospatial analysis software. This post demonstrates how to access COG images with QGIS using a public version of the bucket created previously.

In the bucket, copy the Amazon S3 URL of any of the COG images that you would like to analyze in QGIS.

how to grab amazon S3 url

Open QGIS => Layer => Data Source Mapper. Inside Data Source Mapper, select the Add Raster Layer tab. Under Source type, select Protocol: HTTP(S), cloud, etc and paste the Amazon S3 URL in the URI input box under Protocol. Select Add and close the Data Source Mapper. The COG image from Amazon S3 has loaded within 5-10 seconds inside of QGIS and is ready for the user to analyze. If you prefer to use a nonpublic S3 bucket, then QGIS has options to input AWS credentials to allow QGIS to access images inside of the nonpublic bucket.

how to open cog on qgis

Using COGs remotely has a tradeoff. When using a COG, scanning an image takes slightly longer to render than an image that has been fully downloaded and opened locally inside of QGIS. However, the ability of COGs to rapidly display in a user’s geospatial software using only a URL is a clear speed advantage over having to download entire non-COG images and then loading them inside of QGIS.

Cleaning up

The following instructions are for deleting the resources created in this post:

To delete the input/output S3 buckets

Sign in to the AWS Management Console and open the Amazon S3 console.
In the Buckets list, select the option next to the name of the input bucket that you created, and then choose Delete at the top of the page. If the bucket is not empty, you must choose Empty and submit ‘permanently delete’ in the input field prior to deleting the bucket.
On the Delete bucket page, confirm that you want to delete the bucket by entering the bucket name into the text field, and then choose Delete bucket.

Repeat the instructions above for the output bucket.

To delete the AWS Lambda function

Sign in to the AWS Management Console and open the AWS Lambda console.
Select Functions in the navigation sidebar.
In the Functions list, select the option next to the name of the function that you created, and then choose Actions at the top of the page. Click Delete in the dropdown menu.
Type ‘delete’ in the input box and click Delete at the bottom.

To delete the ECR repository

Sign in to the AWS Management Console and open the Amazon ECR console.
Select Repositories in the navigation sidebar.
In the Private repositories list, select the option next to the name of the repository that you created, and then choose Delete at the top of the page.
Type ‘delete’ in the input box and click Delete at the bottom.

Conclusion

In this post, we demonstrated a solution that helps you convert imagery to COGs using AWS serverless technologies. We walked through creating the necessary input and output S3 buckets and how to build a containerized Lambda function that can be triggered by inputs to Amazon S3 to convert those inputs to COGs. Finally, we demonstrated how to quickly and remotely access COG images in the output S3 bucket using geospatial analysis software QGIS.

By using COGs users no longer have to duplicate data. Users can access large satellite imagery immediately instead of having to download it locally, which can take minutes. Users can also stream parts of the image they need instead of using the whole image, leading to faster processing speeds. The ability to access and process data faster can boost user efficiency and productivity.

Thanks for reading this post. If you have any comments or questions, please leave them in the comments section.