AWS Government, Education, & Nonprofits Blog

Keeping a SpatioTemporal Asset Catalog (STAC) Up To Date with SNS/SQS

A guest post by Frederico Liporace, Software Development Director at AMS Kepler


The SpatioTemporal Asset Catalog (STAC) specification aims to standardize the way geospatial assets are exposed online and queried. The China-Brazil Earth Resources Satellites (CBERS) are the result of a cooperation agreement between Brazilian and Chinese space agencies (INPE and CAST, respectively), which started in 1988. Since then, five satellites were launched (CBERS-1/2/2A/3/4).

The mission generates images from Earth with characteristics similar to USGS’ Landsat and ESA’s Sentinel-2 missions. In 2004, INPE announced that all CBERS-2 images would be available at no charge to the public. It was the first time this distribution model was used for medium-resolution satellite imagery. Now, this model is used for all CBERS satellite images.

This post describes how data from CBERS-4 that is stored on AWS implements STAC for its satellite imagery archive. The archive consists of imagery from MUX, AWFI and PAN cameras, which acquire Red, Green, Blue, NIR, and Panchromatic images from 64m to 5m resolution with a revisit cycle that ranges from five to 52 days. The architecture in this post can be generalized to similar, constantly updating data sources with the need for continuous metadata generation. The architecture is designed to consume services provided by CBERS on AWS.

 

Generating the STAC Item

The STAC Item is a GeoJSON feature, with additional properties, that describes each atomic data object to be discovered. The GeoJSON specification defines a standard for encoding a variety of geographic data structures as JSON documents. The listing below shows an excerpt from a CBERS-4 MUX scene. Note, the geographic information about the scene’s bounding box and geometry, which enables geographic-based search.

{
  "id": "CBERS_4_MUX_20170618_057_121_L2",
  "type": "Feature",
  "bbox": [
    47.930129,
    -19.401998,
    49.329281,
    -18.16659
  ],
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": []
  },
  "properties": {
    "datetime": "2017-06-18T07:02:39Z",
    "provider": "INPE",
    "eo:sun_azimuth": 31.6748,
    "eo:sun_elevation": 41.0625,
    "eo:epsg": 32651,
    "cbers:data_type": "L2",
    "cbers:path": 57,
    "cbers:row": 121,
  },
  "links": {
    "self": {},
    "catalog": {}
  },
  "assets": {
    "thumbnail": {},
    "metadata": {},
    "B5": {}
  }
}

The last step of each CBERS-4 scene ingestion into AWS is the generation of a quicklook, or thumbnail, image. The availability of the quicklook is published to a public Amazon Simple Notification Service (SNS) topic. There is a separate topic for each CBERS-4 camera, so downstream subscribers can listen for notifications for only the sensor they are interested in.

The STAC-generation chain starts when the New Scenes Queue, subscribed to the quicklook SNS topics, receives a message indicating that a new scene has been ingested. The figure above shows MUX and AWFI topics as examples. A STAC item generator AWS Lambda function consumes the messages from the queue. In this architecture, we are using the recently launched ability to trigger Lambda functions directly from Amazon Simple Queue Service (SQS). Triggering the Lambda functions directly from SQS removes the need to explicitly poll the queue for new messages, automatically scale the number of functions to be instantiated depending on the number of messages in the queue, and provide a robust error handling mechanism, since messages processed by Lambda function that fail are automatically returned to the queue.

The STAC item generator Lambda function reads the original CBERS scene’s metadata, an XML file, from the publicly available cbers-pds bucket; generates a corresponding STAC item and stores the item into another publicly available cbers-stac bucket. The processing of each item is completely independent, and thus may be executed in parallel and take advantage of the automatic scaling performed by the SQS/Lambda integration.

Each generated STAC item is also published to a public CBERS STAC SNS topic. The published SNS message body is the STAC item JSON itself, with some attributes such as a scene’s datetime and geographic bounding box that allows basic geographic filtering from listeners. We later show how the SNS topic may be used to update a geospatial searching mechanism.

The Static STAC Catalog

STAC items may be organized in Static STAC catalogs, which we will call catalogs. The “Static STAC Catalog Update” box of the architecture shown in Figure 1 is responsible for generating and updating the catalogs.

Each catalog is a JSON file that references STAC items and/or other catalogs, among other references. Catalogs provide a hierarchy for STAC items, and may be browsed in a standard way by tools such as Radiant Earth Foundation’s STAC browser. An instance of the STAC browser serving a CBERS on AWS static catalog is available here.

We chose to organize the items using the CBERS-4 Reference Grid System, which associates each scene with a path and a row number. The reference system is constructed in a way so that scenes generated in a satellite pass have the same path with increasing rows. The figure below shows the reference grid footprint, as displayed by Remote Pixel’s viewer.

The choice of grid system means that our STAC items’ S3 keys are defined as {CAMERA}/{PATH}/{ROW}/{SCENE_DATE}.json, and at each level we have a catalog file referencing the children catalogs or STAC items. The listing below shows an excerpt of a catalog example for CBERS-4 MUX path 57, note that one child catalog is defined for each row:

{
  "name": "CBERS4 MUX 057",
  "description": "CBERS4 MUX camera path 057 catalog",
  "links": [
    {
      "rel": "self",
      "href": "catalog.json"
    },
    {
      "rel": "parent",
      "href": "../catalog.json"
    },
    {
      "rel": "child",
      "href": "120/catalog.json"
    },
    {
      "rel": "child",
      "href": "121/catalog.json"
    },
    {
      "rel": "child",
      "href": "122/catalog.json"
    }
  ]
}

An excerpt from the MUX catalog for MUX path 057 and row 121 is shown below. This catalog links to all available scenes for this path and row:

{
  "name": "CBERS4 MUX 057/121",
  "description": "CBERS4 MUX camera path 057 row 121 catalog",
  "links": [
    {
      "rel": "self",
      "href": "catalog.json"
    },
    {
      "rel": "parent",
      "href": "../catalog.json"
    },
    {
      "rel": "item",
      "href": "CBERS_4_MUX_20180713_057_121_L2.json"
    },
    {
      "rel": "item",
      "href": "CBERS_4_MUX_20180808_057_121_L2.json"
    },
    {
      "rel": "item",
      "href": "CBERS_4_MUX_20171121_057_121_L2.json"
    },
    {
      "rel": "item",
      "href": "CBERS_4_MUX_20180617_057_121_L2.json"
    }
  ]
}

The catalogs could be updated by the STAC Item Generator Lambda function, but this would possibly result in various repeated operations when one would suffice. Let’s say, for instance, that there is a new STAC item with MUX/057/121/ CBERS_4_MUX_20180617_057_121_L2.json key. We would have to check and update catalogs at three levels:

  1. MUX/ level, since there is a possibility of 057 being a new PATH not yet included into the catalog.
  2. MUX/057 level, since there is a possibility of 121 being a new ROW under path 057.
  3. MUX/057/121 level, since CBERS_4_MUX_20180617_057_121_L2.json is a new scene and must be inserted into the 057/121 catalog.

Because of the way that scenes are acquired there is a high probability that CBERS_4_MUX_20180617_057_121_L2 is generated in tandem with other scenes with distinct rows but in the same path, CBERS_4_MUX_20180617_057_122_L2 for instance. Note that the first two updates mentioned above would be repeated, one right after the other. This gets worse as we have more scenes in the same path.

In order to optimize this operation, we adopt a ‘lazy’ approach and, instead of updating the catalogs for each STAC item, we write the levels that need to be updated in an Amazon DynamoDB table. Repeated levels are simply overwritten. The Move Catalog Level Keys Lambda function, triggered by a timed Amazon CloudWatch event, moves all keys that need to be updated to the Catalogs to be Updated SQS queue, which triggers instances of the UpdateSingleCatalog Lambda that update catalog levels in parallel. The catalogs are refreshed from the output of a S3 LIST operation in the appropriate prefix of the S3 bucket.

The STAC Catalog API

The STAC standard also defines a RESTful API to respond to geospatial queries. The CBERS STAC topic may be used by a service that implements a STAC RESTful API based on the ElasticSearch engine. The figure below shows how this is implemented.

STAC Items To Be Ingested is an SQS queue subscribed to the CBERS STAC SNS topic presented above. We again use the Lambda/SQS trigger to instantiate the Insert into Elastic Lambda functions that consume the STAC items from the queue and insert them into an ElasticSearch instance. The architecture may consume STAC standard items from any source, which is a significant advantage, since it does not have the burden of dealing with distinct metadata formats. The figure shows an example of possible SNS topics for Landsat and Sentinel satellites, in addition to CBERS.

Basic geographic filtering of SNS messages

Some STAC item properties, such as the bounding box coordinates are also sent as SNS topic messages attributes. This enables each topic subscriber to filter the messages and receive only items that cover a point of interest. The listing below shows an SNS filter policy that uses the scene’s bounding box coordinates message attributes and notifies a particular subscriber only when items over Rio de Janeiro (22.9068° S, 43.1729° W) are available.

{
	"bbox.ll_lon": [{"numeric":["<=",-43.1729]}],
	"bbox.ur_lon": [{"numeric":[">=",-43.1729]}],
	"bbox.ll_lat": [{"numeric":["<=",-22.9068]}],
	"bbox.ur_lat": [{"numeric":[">=",-22.9068]}]
}

Reprocessing and Reconciliation

In the previous sections, we detailed how the architecture keeps the catalog up to date. Sometimes we may need to reprocess the whole catalog from scratch. That happens for instance when we want to support a new STAC metadata version or when we want to reconcile the catalog with the original S3 bucket contents.

The Reprocess/Reconcile box in Figure 1 shows how this is implemented by a simple Lambda function Generate Messages from XML files that scans the cbers-pds S3 bucket and generates a message for each XML metadata file available. The message is built using the same format that is received from the SNS notification and then sent to New Scenes Queue.

Scaling

Note, that in the Reprocessing and Reconciliation scenario, the workload will be much higher than that of the update scenario previously presented. The workload changes from from tens of scenes per day to hundreds of scenes in some seconds.

The architecture scales accordingly by:

  • Automatically creating more Lambda instances to consume from New Scenes Queue.
  • Using the DynamoDB on-demand option. Figure 4 shows the write capacity for the DynamoDB table when the system was transitioned from DynamoDB provisioned capacity to the on-demand model. The previously provisioned capacity is shown in red. Although adequate for the update scenario, this provisioned capacity would throttle the writes for the reconcile operations performed, shown in blue.

We need special attention on how effective scaling is achieved regarding the Catalog API. It is possible that the ElasticSearch service is overloaded and is unable to keep up with the rate of STAC items. This will not result in loss of data since the messages that were not processed by the Lambda function will return to the queue automatically, but a high number of retries could happen.

A better approach to limit the load on the ElasticSearch instance is by constraining the number of concurrent executions of the Insert into Elastic Lambda function to a number suitable to be handled by the particular ElasticSearch instance.


The work described in this post is supported by the AWS Cloud Credits for Research program.