Processing satellite imagery with serverless architecture
This post was written by Justin Downes, Machine Learning Consultant.
The amount of satellite imagery publicly available is growing and images from satellites tend to be large. Architectures for processing those images for machine learning must scale to meet this demand. Since many machine learning models need smaller images of a fixed size to make predictions, these images are broken into smaller sections in a process known as chipping.
This post explains a serverless approach to chipping images and sending the results to an inference engine for predictions. These predictions use the smaller sections of the large image and must be projected back to the original image. This architecture automatically handles these projections and allows you to perform any post processing needed. This extensible architecture allows for the inclusion of additional pre- and post-processing functions, such as color correction or saving images of detected objects.
The main use case of this framework is to prepare imagery for inference pipelines. I include a reference implementation that uses Amazon Rekognition for inference. The output processing functions parse Amazon Rekognition’s specific prediction format. I note where to make changes if you use your own prediction parsing format.
While the pipeline is geared toward inference, the chipping and preprocessing functions could be used to prepare imagery for model training. Only the first half of the solution needs to be deployed (see the Deploy with AWS SAM section) and you can add any preprocessing functions you need.
For this example, you must have an AWS account and a role with sufficient access to create resources in the following services:
- Imagery files are stored in an S3 bucket.
- The S3 objects invoke a Lambda function, which adds object references to a queue.
- A separate reference is added to the queue for each factor of parallelism that is desired. Since large images can take time to chip, this provides a mechanism to use multiple Lambda functions to work on the same image.
- For each factor of parallelism that you want to chip the images in, a duplicate reference is added to the queue. Each duplicate message invokes another Lambda function to chip the image. This is done in parallel by skipping over sections handled by other functions.
- The SQS queue maintains references for imagery files.
- The image chipping Lambda function retrieves references from the queue and chips images.
- Chips are written to the chip bucket.
- Chip information is written to the chip information bucket.
- Inference is run on the chips and the results are written to the predictions bucket. You can use any inference engine you prefer.
- The prediction detection Lambda function detects new predictions and retrieves the correct chip information from the chip info bucket.
- This Lambda function assumes that a reference to the chip information file is in the prediction.
- Individual predictions and chip information are paired and put onto the de-chipping queue.
- Prediction de-chipping Lambda function converts chip coordinates back into original imagery coordinate space and puts results in the output bucket.
Deploying with AWS SAM
Download the code used in this post from this GitHub repo. To deploy only the chipping portion of the architecture, comment out the following sections in the template.yaml file: DeChipper, PredictionDetection, and DeChippingQueue.
This example includes an AWS Serverless Application Model (AWS SAM) template to make it easier to deploy to your own account. You can reuse components for your own solution. This approach promotes loose coupling and integrates with other components.
- Navigate to the repository:
- Build the AWS SAM application:
- Deploy the AWS SAM application:
sam deploy –guided
The Lambda function that uses Amazon Rekognition for inference is commented out to reduce the risk of unintended charges. To use Amazon Rekognition, uncomment the section “RekognitionInference” in the template.yaml file. To use your own inference endpoint, keep this section commented and deploy a separate Lambda function to read image chips from the chip bucket.
This serverless design is scalable with no need to provision infrastructure and allows you to control the throughput of images. SQS controls the throughput of artifacts that are persisted in the system. The following sections describe the components in the solution and explain how they can be configured to fit your needs.
The user interacts with some S3 buckets to drop initial imagery into the pipeline, retrieve image chips, and retrieve final prediction output.
Other buckets are used for internal processing. They stage information for further processing in the application. The chip info bucket stores information about each image chip so that predictions can be translated back into the original, larger, image’s pixel space.
To do much of the post-processing of a prediction, information about where the original image chip came from must be stored somewhere. In this solution that information is stored in an S3 bucket, the ChipInfo bucket. If another solution is used to store chip information, such as DynamoDB, then care must be taken such that a given prediction can be matched to the correct record containing the chip’s information. The predictions bucket contains the output of our inference model, in this case Amazon Rekognition’s object detection.
The SQS queues decouple functions in the pre- and post-processing of the imagery. You can extend this by adding additional queues that are processed by other functions. The Lambda functions that precede each queue can publish to multiple queues. Or you can chain multiple queues together with Lambda functions processing data between them. This allows parallel processing of information or processing of information that needs to be sequential.
The chipping queue contains references to images that must be chipped. The chipping Lambda function pulls messages off the chipping queue and partitions that image. To chip an image in parallel, each chipping function must know which parts of the image to save and which to skip. This is enabled by having multiple messages for a single image on the chipping queue where each message tells the chipping function how to chip the larger image.
The second queue, the de-chipping queue, takes the results of merging the predictions and chip information messages and stages them for any post processing. This implementation projects those predictions back into the original image space but one could add any other functions they choose.
Detection Lambda function: This is triggered by a new image being put into the imagery S3 bucket. For each new object, it detects an entry is pushed on to the SQS queue specified by the OUTPUT_QUEUE parameter and if the value for DUPLICATE_CHIPPERS is greater than 1 then that many entries are pushed onto the queue. This is done so that the chipper Lambda function can be instantiated multiple times to chip the image in parallel.
Chipping Lambda function: This processes larger images into smaller chips. The size of the smaller chip and the stride that the chipping function takes between chips are customizable through environment variables. This function will also skip alternating chips if there are parallel chipping functions. The chip itself will be saved to an S3 bucket and the information about the chip (original image, where the chip was taken, etc.) will be written to a different S3 bucket.
Prediction Lambda function: This detects predictions that have been made on the chips produced by the chipping Lambda function. It then merges these predictions with the chip’s information contained in the chip information bucket. The example provided with this post shows an implementation based on Amazon Rekognition’s object detection output, but this could easily be modified to parse any other inference output. After merging, this Lambda function then takes this information and pushes each item onto an SQS queue.
De-chipping Lambda function: This Lambda function processes messages from the PREDICTION_QUEUE, performs processing, and stores the results in an S3 bucket. In this implementation, the post processing projects the prediction coordinates from the chip’s pixel space back into the original, larger image’s pixel space. Other functions that could be applied here would be to chip out each detected object or to forward to some follow-on ML model.
This post shows how to deploy an imagery processing pipeline in the AWS Cloud. It is decoupled to allow both pre and post-processing extensions to be integrated into the pipeline more easily. Visit the code repository for further information.
This architecture uses Amazon Rekognition. To use your own inference engine, such as a SageMaker endpoint, you must modify a couple of functions. The chip detection Lambda function must be modified to send images to your endpoint instead of Amazon Rekognition. The Lambda function that detects the predictions must parse the output of your endpoint correctly to match the chip information files in the chip info bucket.
For more serverless learning resources, visit Serverless Land.