AWS Public Sector Blog

Earth observation using AWS Ground Station: A how to guide

We live in an information society where access to data helps drive our decision and behaviors. For example, recent satellite images from NASA showing the melting of the ice cap on Eagle Island in Antarctica and its impact on wildlife is powerful. It highlights the need to protect endangered species.

Over the past decade, a crop of new companies focused on Earth observation (EO) have made valuable EO data more accessible to a broad audience than before. As a result, we are seeing a dramatic increase in EO science.

The value of EO data is the ability to monitor change. Recently, new instrumentation aboard satellites such as radar now allow us to “see” through clouds, allowing observation of any part of the Earth. The reliable, repeatable, and accurate data now flowing from satellites is opening up the reality of operational services powered by satellite imagery.

As the space race to launch a fleet of small low earth orbit (LEOs) satellites heats up, technological developments such as smaller satellites and the re-use of launch vehicles has driven down the cost of building and launching satellites. If you are within a satellite footprint and need to lease transponder capacity on an ad-hoc basis or do not want to invest in costly satellite ground segment infrastructure, consider AWS Ground Station.

AWS Ground Station is a fully managed, pay-per-use service that lets you communicate with  satellites without having to buy, lease, build, or maintain your own satellite ground stations. If are considering using EO data for a startup idea or scientific endeavor, this blog post explains how to use AWS Ground Station to receive and process data from earth observation satellites. With the solution proposed in this post, you will have usable data in your Amazon Simple Storage Service (Amazon S3) bucket within a few minutes of successful satellite contact.

Global change monitoring is a reality and AWS Ground Station increases the speed of data access while reducing the cost of data connection. Data received using AWS Ground Station can be further processed using AWS global infrastructure including low-cost storage and web publishing with Amazon S3, real-time streaming using Amazon Kinesis, or machine learning with Amazon SageMaker.

High-level solution overview

AWS Ground Station solution overview

In this blog post, I receive and process data from the AQUA satellite whose location can be tracked in real-time using this link. AQUA launched in 2002 and is part of NASA’s Earth Science Data Systems (ESDS) program. It orbits the Earth in a Sun-synchronous near-polar orbit at an altitude of 705km, which makes it a LEO (Low Earth Orbit) satellite. AQUA has five active sensing instruments. The ESDS program plays a vital role in the development of validated, global, and interactive Earth system models able to predict global change accurately.

AWS Ground Station currently supports LEO and MEO (Medium Earth Orbit) satellites. Because of their orbital periods, these satellites are visible from the ground only for a few minutes during each pass, and communication is only possible when they are within line of sight of a ground station. AWS Ground Station establishes contact with the satellites, and then receives, demodulates, and decodes its radiofrequency signals. It then pushes the decoded data to the receiver Amazon Elastic Compute Cloud (Amazon EC2) instance as a VITA 49 stream.

A data capture application running on the receiver EC2 instance ingests the incoming VITA49 stream. The payload from within each VITA49 packet is extracted and combined into a raw data file. The raw data file is held locally and also pushed to Amazon S3 to be reprocessed as needed at a later date.

AQUA broadcasts CCSDS-compliant CADU (Channel Access Data Unit) data frames. CADUs are processed into Level 0 data products using NASA’s RT-STPS (Real-Time Software Telemetry Processing System). Level 0 data is then pushed to S3. IPOPP (International Planetary Observation Processing Package), also provided by NASA, is used to process Level 0 data into higher-level products. For example, Level 1 HDF files and Level 2 HDF files and TIFF images. IPOPP is installed and configured on the processor Amazon EC2 instance, which then pushes the Level 1 and Level 2 data products to S3. Once in S3, data products can be published to data subscribers or retrieved by other AWS services, such as Amazon SageMaker, for near real-time processing.

Read a detailed description of the data processing levels as defined by NASA.

Onboarding a satellite into AWS Ground Station

Ready to downlink your own data from Aqua?

To get started, email aws-groundstation@amazon.com to have AQUA added to Ground Station on your AWS account. Include your AWS account number and reference AQUA and its NORAD ID (27424). Once the satellite is successfully onboarded into your AWS account, it is visible in the Ground Station Console or via the AWS CLI.

Getting ready

In this section, we perform some pre-requisite steps:

1. Before starting, check that you have an Amazon Virtual Private Cloud (Amazon VPC) in your AWS Account containing at least one public subnet and an Internet Gateway. The VPC must be located in the Region of the ground station you plan to use. For simplicity, I use a public subnet in this post. However, in a production environment you would probably choose to place your EC2 instances in private subnets.

2. You also need an SSH key to connect to the EC2 instances that you will create. If you don’t have one, create it using the EC2 console and download the key to a safe location on your personal computer. Update the permissions on the key running the command below:

chmod 400 <pem-file>

3. Create an S3 bucket with the default configuration in the same Region of the ground station you plan to use. This bucket will hold the software to be installed on the EC2 instances. It will also be used to store raw, Level 0, Level 1, and Level 2 data. Create this bucket through the S3 console or the AWS CLI, using the following command:

aws s3 mb s3://<your-bucket-name> --region <your-region-name>

4. Download NASA’s RT-STPS software from the NASA DRL website. Register on the site first. The following files are required:

1.       RT-STPS_6.0.tar.gz

2.       RT-STPS_6.0_PATCH_1.tar.gz

3.       RT-STPS_6.0_PATCH_2.tar.gz

4.       RT-STPS_6.0_PATCH_3.tar.gz

5. Upload the RT-STPS files to the S3 bucket using the S3 console or this CLI command:

aws s3 cp <filename> \
s3://<your-bucket-name>/software/RT-STPS/<filename>

6. Upload the data capture application code (receivedata.py, awsgs.py and start-data-capture.sh) from the aws-gs-blog S3 bucket to your S3 bucket using the S3 console or the CLI command below.

Note: The source bucket is in the us-east-2 region. If your bucket is in a different region, use both the --source-region and --region CLI parameters in the following commands, i.e. --source-region us-east-2 --region <your-region>

aws s3 cp \
s3://aws-gs-blog/software/data-receiver/<filename> \
s3://<your-bucket-name>/software/data-receiver/<filename>

CloudFormation Part 1: Creating a stack to define the Mission Profile

Once the satellite is onboarded and the preparations are completed, create a Mission Profile to tell AWS Ground Station how to process the incoming radio frequency signal. You will use an AWS CloudFormation template prepared by the AWS team. The CloudFormation template also installs RT-STPS on the receiver EC2 instance and configures the data capture application to automatically start on the instance.

1. Open the CloudFormation console and choose create stack, then specify the Amazon S3 URL below, or click this link to automatically enter the URL for you.

https://aws-gs-blog.s3.us-east-2.amazonaws.com/cfn/aqua-rt-stps.yml

2. Enter the following parameters:

Stack name: <any value> e.g. gs-aqua-retriever

CFTemplateVersion: <leave as default>

CreateReceiverInstance: false

InstanceType: m5.4xlarge

S3Bucket: <your-bucket-name> (The one you created earlier)

SSHCidrBlock: <XXX.XXX.XXX.XXX/32> (Enter the Public IP Address of the computer you are using to connect to the EC2 instance. If needed, get it from https://whatismyip.com. Ensure you add “/32” to the end of the IP address)

SSHKeyName: <SSH-key-name> (The SSH key you are using to connect to the EC2 instance)

SatelliteName: AQUA

SubnetId: <subnet-id> (Select a Public Subnet)

VpcId: <vpc-id> (Select the VPC containing the above public subnet)

3. Select Next, keep the default options, then select Next again.

4. IMPORTANT: on the last screen before confirming the creation of the stack, scroll down and select the checkbox next to “I acknowledge that AWS CloudFormation might create IAM resources”, then select Create stack. Otherwise, CloudFormation returns an error.

5. Wait for the CloudFormation stack to show “CREATE_COMPLETE”

Scheduling a contact

Schedule a contact with Aqua from the AWS Ground Station console. Select Reserve contacts now, make sure the Satellite catalog number corresponds to AQUA (27424) and pick the ground station location that you want to use. The service will display a list of possible contact times. Pick one and select Reserve contact. The console will display a message confirming that the reservation has been successful. You can find additional information about the process in the AWS Ground Station documentation.

CloudFormation Part 2: Preparing the Receiver EC2 instance

During a successful contact, the data capture application in the EC2 instance will capture and process the data, and then will place the raw data and the processed Level 0 data product into S3. Follow these steps to create and initialize the receiver EC2 instance.

Important: Run these steps at least one hour ahead of the scheduled contact. This allows time to troubleshoot or cancel the contact if needed.

1. Re-open the CloudFormation console and select Stacks.

2. Select the stack you created earlier and select Update.

3. Select Use current template, then select Next.

4. Change the CreateReceiverInstance parameter to ‘true’ and select Next.

5. Leave the default options on the Configure stack options page and select Next.

6. Remember to select the checkbox next to “I acknowledge that AWS CloudFormation might create IAM resources” on the last screen, then select Create stack.

7. If you want to track the installation and configuration of the software on the EC2 instance, SSH into the instance and run the command:

tail -F /var/log/user-data.log

8. You will know that everything went well and that the data capture application is running after a few minutes, once lines similar to these appear in the user-data.log file:

Starting start-data-capture.sh AQUA s3-bucket-name 2>&1 | tee /opt/aws/groundstation/bin/data-capture_20200313-1121.log
20200313-11:21:40	Satellite: AQUA
20200313-11:21:40	S3 bucket: s3-bucket-name
20200313-11:21:40	Getting RT-STPS software from S3 bucket: s3-bucket-name
…
20200313-11:21:43	Running python3 receivedata.py 20200313-1121-AQUA-raw.bin

Initial Data Capture and Processing

Once the data capture application starts you can follow its progress by running the command:

tail -F /opt/aws/groundstation/bin/data-capture*.log

Following a successful contact, you will see lines in the log file indicating that the data was captured and uploaded to S3. Raw data can be found in the following location: /data/raw/ in your S3 bucket, like the example here:

s3://<your-bucket-name>/data/raw/20200109-0912-AQUA-raw.bin

Level 0 data can be found in the following location: /data/level0/ in your S3 bucket, like the example here:

s3://<your-bucket-name>/data/level0/P1540064AAAAAAAAAAAAAA20009094519000.PDS

Processing the data further: The Processor EC2 Instance

Level 0 data is the foundation used for the production of the higher-level data products, and can be processed into Level 1 and Level 2 products. For this, we will use NASA’s IPOPP (International Planetary Observation Processing Package). IPOPP’s user guide recommends not to install IPOPP on the same server as RT-STPS, so we will install it on a separate instance which we will denominate the processor instance.

CloudFormation Part 3: Preparing the Processor EC2 instance and processing data into Level 1 data products

Note 1: The S3 bucket that already contains the Level 0 data must be used.

Note 2: Subscribe to the official CentOS7 AMI before continuing. Do this in the AWS Marketplace.

Note 3: The source bucket is in the us-east-2 region. If your bucket is in a different region, use both the --source-region and --region CLI parameters in the following commands, i.e. --source-region us-east-2 –region <your-region>

1. Copy the required files (ipopp-ingest.sh, install-ipopp.sh, downloader_ipopp_4.0.sh) from the aws-gs-blog S3 bucket to your S3 bucket manually or using this CLI command:

aws s3 cp \
s3://aws-gs-blog/software/IPOPP/<filename> \
s3://<your-bucket-name>/software/IPOPP/<filename>

2. Copy the required files IMAPP_3.1.1_SPA_1.4_PATCH_2.tar.gz from the aws-gs-blog S3 bucket to your S3 bucket manually or using this CLI command:

aws s3 cp \
s3://aws-gs-blog/software/IMAPP/IMAPP_3.1.1_SPA_1.4_PATCH_2.tar.gz \
s3://<your-bucket-name>/software/IMAPP/IMAPP_3.1.1_SPA_1.4_PATCH_2.tar.gz

3. Create the CloudFormation stack. Open the CloudFormation console, choose Create stack and specify the following Amazon S3 URL. Or use this link to automatically enter the URL for you.

https://aws-gs-blog.s3.us-east-2.amazonaws.com/cfn/ipopp-instance.yml

4. Enter the following parameters:

Stack name: <any value> e.g. gs-aqua-processor

InstanceType: m5.xlarge is ok for most SPAs (Science Processing Algorithms). However, use m5.4xlarge to run the Blue Marble MODIS Sharpened Natural/True Color SPAs.

S3Bucket: <your-bucket-name> (The one you created earlier and that now contains L0 data)

SSHCidrBlock: <XXX.XXX.XXX.XXX/32> (Enter the Public IP Address of the computer you are using to connect to the EC2 instance. If needed, get it from https://whatismyip.com. Ensure you add “/32” to the end of the IP address)

SSHKeyName: <SSH-key-name> (The SSH key you are using to connect to the EC2 instance)

SatelliteName: AQUA

SubnetId: <subnet-id> (Select a Public Subnet)

VpcId: <vpc-id> (Select the VPC containing the above public subnet)

IpoppPassword: <Enter a password for the ipopp user on CentOS within the processor EC2 instance>

5. Leave the default options on the Configure stack options page and select Next.

6. Select the checkbox next to “I acknowledge that AWS CloudFormation might create IAM resources”, then select Create stack.

7. Wait for the CloudFormation stack to show “CREATE_COMPLETE”

You can check progress by connecting to the EC2 instance over SSH and running the command:

tail -F /var/log/user-data.log

The log file shows how the CloudFormation template downloads and configures the required software, and then pulls the Level 0 data from S3. The data is ingested into IPOPP and processed by the SPAs into higher-level products. Finally, the higher-level products are uploaded to S3. During the process, IPOPP automatically downloads ancillary files – ensure the EC2 instance has Internet access.

Once the Level 0 data is successfully ingested, the user-data log will display something similar to the following:

Scanning /home/ipopp/drl/data/dsm/ingest for Aqua Level 0 files
    -------------------------------------------------------------------------------------------
    Found P1540064AAAAAAAAAAAAAA20009094519001.PDS ...
    Found matching P1540064AAAAAAAAAAAAAA20009094519000.PDS

By default, IPOPP enables only the SPAs that process the Level 0 data into Level 1 data products. These can be found in the following location:

S3://<your-bucket-name>/data/AQUA/modis/level1

Within the instance, Level 1 data products can be found in the following location:

$HOME/drl/data/pub/gsfcdata/aqua/modis/level1:

These are:

  1. MYD01.20006185715.hdf (MODIS/Aqua Level 1A Scans of raw radiances in counts)
  2. MYD021KM.20006185715.hdf (MODIS/Aqua Level 1B Calibrated Radiances – 1km)
  3. MYD02HKM.20006185715.hdf (MODIS/Aqua Level 1B Calibrated Radiances – 500m)
  4. MYD02QKM.20006185715.hdf (MODIS/Aqua Level 1B Calibrated Radiances – 250m)
  5. MYD03.20006185715.hdf (MODIS/Aqua Geolocation – 1km)

Note: The 500km and 250km resolution files are often not produced from a satellite contact in the night.

Linux / Mac Instructions

1. Run the command below to connect to the EC2 instance using SSH and tunnel the VNC traffic over the SSH session.

ssh -L 5901:localhost:5901 -i <path to pem file> centos@<public ip address of EC2 instance>

2. Open the Tiger VNC Client application on your PC and connect to ‘localhost:1’

3. When prompted, enter the ipopp password you provided to the CloudFormation template in the earlier step.

Windows Instructions

1. Download the open source ssh client Putty here

2. Open Putty and enter the public IP of the EC2 instance in the Session->Host Name (or IP Address) field

3. Enter ‘centos’ in Connection->Data-> Auto-login username

4. In Connection->SSH->Auth, browse to the correct PPK key file (private SSH key) for the EC2 instance

5. In Connection->SSH->Tunnels, enter 5901 in Source port, enter localhost:5901 in Destination, click Add

6. Click Session, enter a friendly name in Save Sessions, then click Save

7. Click Open to open the tunneled SSH session

8. Open the Tiger VNC Client application on your PC and connect to ‘localhost:1’

9. When prompted, enter the ipopp password you provided to the CloudFormation template in the earlier step

Note 1: If the VNC client can’t connect to the processor instance, make sure that the VNC server is running on the instance. By following the procedure below:

i. Switch to the ipopp user on the SSH shell, using the password that you defined while creating the CloudFormation stack on CloudFormation Part 3.

su – ipopp

ii. Find the active vncserver process by running the command:

vncserver -list

iii. If no vncserver process is running, start one with the command:

vncserver

Note 2: If the VNC client is able to connect to the EC2 instance but it only displays a black screen, kill the vncserver process and restart it. As the ipopp user using these commands:.

su - ipopp
vncserver -kill <display> e.g. ‘:1’
vncserver

3. Once you have connected to the graphical interface on the processor instance via VNC, log in using the ipopp user and the password specified in CloudFormation Part 3. Once logged in, open a terminal and run the following command to start up the IPOPP configuration dashboard:

~/drl/tools/dashboard.sh &

4. In the dashboard, select Mode->Configuration Editor.

5. Select Actions->Configure Projection, select Stereographic, then select Configure Projection.

6. Enable additional SPA modules by clicking on them. You may want to enable all the SPAs in your initial experimentation in order to see the data products created by each SPA.

7. Once you finish enabling SPAs, select Actions->Save IPOPP Configuration.

8. To start the SPA services and track the progress visually, select Mode->IPOPP Process Monitor, confirm, then click Actions->Start SPA Services and confirm.

Each SPA, when started, checks for the input data it requires and processes it. The resulting L2 data products for AQUA can be found on S3 in the following path:

S3://<your-bucket-name>/data/AQUA/modis/level2

Within the instance, Level 2 data products can be found in the following path:

$HOME/drl/data/pub/gsfcdata/aqua/modis/level2

Once the configuration process is completed, it doesn’t need to be repeated. IPOPP automatically starts the additional SPAs with each ingest. If you capture data from additional satellite passes, you can trigger the retrieval of Level 0 data from S3 and the IPOPP processing by either rebooting the EC2 instance or running the following command on the instance:

/opt/aws/groundstation/bin/ipopp-ingest.sh AQUA <S3_BUCKET>

Note: This command must be run as the ipopp user, remember to switch to this user first with:

su - ipopp

Once the IPOPP processing completes, Level 1 and Level 2 data products are uploaded to S3.

After executing the ipopp-ingest script, the progress can be tracked with the following command:

tail -F /opt/aws/groundstation/bin/ipopp-ingest.log

 

Viewing the Level 2 data products

Level 2 data products produced by IPOPP are HDF and TIFF files. HDF (Hierarchical Data Files) are data-rich files, which can be browsed using software such as HDFView. However, the most common use of HDF files is to process them programmatically as they contain the actual data rather than the visual representations you find in the TIFF files. If you plan to perform machine learning on the data, you’ll likely use HDF files. If you are looking for visual representations use the TIFF images, or process the Level 2 data further into Level 3 files. For example, KMZ files that can be imported into Google Earth.

Cropped True Colour Corrected Reflectance (CREFL) image showing the United Arab Emirates and Oman, produced using IPOPP’s CREFL SPA. If you look carefully, you might see the Palm Jumeirah in the centre-left of the image, off the coast of Dubai.

Cropped True Colour Corrected Reflectance (CREFL) image showing the United Arab Emirates and Oman, produced using IPOPP’s CREFL SPA. If you look carefully, you might see the Palm Jumeirah in the centre-left of the image, off the coast of Dubai.

 

Level 3 KMZ package created using the polar2grid software and imported into Google Earth.

Level 3 KMZ package created using the polar2grid software and imported into Google Earth.

The data for both images was captured when the AQUA satellite passed over the AWS Ground Station in Bahrain. These images are a snapshot of the actual data available from the AQUA satellite. Other data includes sea and land surface temperatures, aerosol levels in the atmosphere, chlorophyll concentrations in the sea (which are indicators of marine life), and more.

Summary

With AWS Ground Station, you can capture data from satellites without having to buy, lease, build, or maintain your own ground station equipment. Get started with Earth observation data and AWS Ground Station. And learn more about AWS Professional Services.