AWS Machine Learning Blog

Monitoring Lake Mead drought using the new Amazon SageMaker geospatial capabilities

Earth’s changing climate poses an increased risk of drought due to global warming. Since 1880, the global temperature has increased 1.01 °C. Since 1993, sea levels have risen 102.5 millimeters. Since 2002, the land ice sheets in Antarctica have been losing mass at a rate of 151.0 billion metric tons per year. In 2022, the Earth’s atmosphere contains more than 400 parts per million of carbon dioxide, which is 50% more than it had in 1750. While these numbers might seem removed from our daily lives, the Earth has been warming at an unprecedented rate over the past 10,000 years [1].

In this post, we use the new geospatial capabilities in Amazon SageMaker to monitor drought caused by climate change in Lake Mead. Lake Mead is the largest reservoir in the US. It supplies water to 25 million people in the states of Nevada, Arizona, and California [2]. Research shows that the water levels in Lake Mead are at their lowest level since 1937 [3]. We use the geospatial capabilities in SageMaker to measure the changes in water levels in Lake Mead using satellite imagery.

Data access

The new geospatial capabilities in SageMaker offer easy access to geospatial data such as Sentinel-2 and Landsat 8. Built-in geospatial dataset access saves weeks of effort otherwise lost to collecting data from various data providers and vendors.

First, we will use an Amazon SageMaker Studio notebook with a SageMaker geospatial image by following steps outlined in Getting Started with Amazon SageMaker geospatial capabilities. We use a SageMaker Studio notebook with a SageMaker geospatial image for our analysis.

The notebook used in this post can be found in the amazon-sagemaker-examples GitHub repo. SageMaker geospatial makes the data query extremely easy. We will use the following code to specify the location and timeframe for satellite data.

In the following code snippet, we first define an AreaOfInterest (AOI) with a bounding box around the Lake Mead area. We use the TimeRangeFilter to select data from January 2021 to July 2022. However, the area we are studying may be obscured by clouds. To obtain mostly cloud-free imagery, we choose a subset of images by setting the upper bound for cloud coverage to 1%.

import boto3
import sagemaker
import sagemaker_geospatial_map

session = boto3.Session()
execution_role = sagemaker.get_execution_role()
sg_client = session.client(service_name="sagemaker-geospatial")

search_rdc_args = {
    "Arn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8",  # sentinel-2 L2A COG
    "RasterDataCollectionQuery": {
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates": [
                        [
                            [-114.529, 36.142],
                            [-114.373, 36.142],
                            [-114.373, 36.411],
                            [-114.529, 36.411],
                            [-114.529, 36.142],
                        ] 
                    ]
                }
            } # data location
        },
        "TimeRangeFilter": {
            "StartTime": "2021-01-01T00:00:00Z",
            "EndTime": "2022-07-10T23:59:59Z",
        }, # timeframe
        "PropertyFilters": {
            "Properties": [{"Property": {"EoCloudCover": {"LowerBound": 0, "UpperBound": 1}}}],
            "LogicalOperator": "AND",
        },
        "BandFilter": ["visual"],
    },
}

tci_urls = []
data_manifests = []
while search_rdc_args.get("NextToken", True):
    search_result = sg_client.search_raster_data_collection(**search_rdc_args)
    if search_result.get("NextToken"):
        data_manifests.append(search_result)
    for item in search_result["Items"]:
        tci_url = item["Assets"]["visual"]["Href"]
        print(tci_url)
        tci_urls.append(tci_url)

    search_rdc_args["NextToken"] = search_result.get("NextToken")

Model inference

After we identify the data, the next step is to extract water bodies from the satellite images. Typically, we would need to train a land cover segmentation model from scratch to identify different categories of physical materials on the earth surface’s such as water bodies, vegetation, snow, and so on. Training a model from scratch is time consuming and expensive. It involves data labeling, model training, and deployment. SageMaker geospatial capabilities provide a pre-trained land cover segmentation model. This land cover segmentation model can be run with a simple API call.

Rather than downloading the data to a local machine for inferences, SageMaker does all the heavy lifting for you. We simply specify the data configuration and model configuration in an Earth Observation Job (EOJ). SageMaker automatically downloads and preprocesses the satellite image data for the EOJ, making it ready for inference. Next, SageMaker automatically runs model inference for the EOJ. Depending on the workload (the number of images run through model inference), the EOJ can take several minutes to a few hours to finish. You can monitor the job status using the get_earth_observation_job function.

# Perform land cover segmentation on images returned from the sentinel dataset.
eoj_input_config = {
    "RasterDataCollectionQuery": {
        "RasterDataCollectionArn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8",
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates": [
                        [
                            [-114.529, 36.142],
                            [-114.373, 36.142],
                            [-114.373, 36.411],
                            [-114.529, 36.411],
                            [-114.529, 36.142],
                        ]
                    ]
                }
            }
        },
        "TimeRangeFilter": {
            "StartTime": "2021-01-01T00:00:00Z",
            "EndTime": "2022-07-10T23:59:59Z",
        },
        "PropertyFilters": {
            "Properties": [{"Property": {"EoCloudCover": {"LowerBound": 0, "UpperBound": 1}}}],
            "LogicalOperator": "AND",
        },
    }
}
eoj_config = {"LandCoverSegmentationConfig": {}}

response = sg_client.start_earth_observation_job(
    Name="lake-mead-landcover",
    InputConfig=eoj_input_config,
    JobConfig=eoj_config,
    ExecutionRoleArn=execution_role,
)

# Monitor the EOJ status.
eoj_arn = response["Arn"]
job_details = sg_client.get_earth_observation_job(Arn=eoj_arn)
{k: v for k, v in job_details.items() if k in ["Arn", "Status", "DurationInSeconds"]}

Visualize results

Now that we have run model inference, let’s visually inspect the results. We overlay the model inference results on input satellite images. We use Foursquare Studio tools that comes pre-integrated with SageMaker to visualize these results. First, we create a map instance using the SageMaker geospatial capabilities to visualize input images and model predictions:

# Creates an instance of the map to add EOJ input/ouput layer.
map = sagemaker_geospatial_map.create_map({"is_raster": True})
map.set_sagemaker_geospatial_client(sg_client)

# Render the map.
map.render()

When the interactive map is ready, we can render input images and model outputs as map layers without needing to download the data. Additionally, we can give each layer a label and select the data for a particular date using TimeRangeFilter:

# Visualize AOI
config = {"label": "Lake Mead AOI"}
aoi_layer = map.visualize_eoj_aoi(Arn=eoj_arn, config=config)

# Visualize input.
time_range_filter = {
    "start_date": "2022-07-01T00:00:00Z",
    "end_date": "2022-07-10T23:59:59Z",
}
config = {"label": "Input"}
input_layer = map.visualize_eoj_input(
    Arn=eoj_arn, config=config, time_range_filter=time_range_filter
)

# Visualize output, EOJ needs to be in completed status.
time_range_filter = {
    "start_date": "2022-07-01T00:00:00Z",
    "end_date": "2022-07-10T23:59:59Z",
}
config = {"preset": "singleBand", "band_name": "mask"}
output_layer = map.visualize_eoj_output(
    Arn=eoj_arn, config=config, time_range_filter=time_range_filter
)

We can verify that the area marked as water (bright yellow in the following map) accurately corresponds with the water body in Lake Mead by changing the opacity of the output layer.

Post analysis

Next, we use the export_earth_observation_job function to export the EOJ results to an Amazon Simple Storage Service (Amazon S3) bucket. We then run a subsequent analysis on the data in Amazon S3 to calculate the water surface area. The export function makes it convenient to share results across teams. SageMaker also simplifies dataset management. We can simply share the EOJ results using the job ARN, instead of crawling thousands of files in the S3 bucket. Each EOJ becomes an asset in the data catalog, as results can be grouped by the job ARN.

sagemaker_session = sagemaker.Session()
s3_bucket_name = sagemaker_session.default_bucket()  # Replace with your own bucket if needed
s3_bucket = session.resource("s3").Bucket(s3_bucket_name)
prefix = "eoj_lakemead"  # Replace with the S3 prefix desired
export_bucket_and_key = f"s3://{s3_bucket_name}/{prefix}/"

eoj_output_config = {"S3Data": {"S3Uri": export_bucket_and_key}}
export_response = sg_client.export_earth_observation_job(
    Arn=eoj_arn,
    ExecutionRoleArn=execution_role,
    OutputConfig=eoj_output_config,
    ExportSourceImages=False,
)

Next, we analyze changes in the water level in Lake Mead. We download the land cover masks to our local instance to calculate water surface area using open-source libraries. SageMaker saves the model outputs in Cloud Optimized GeoTiff (COG) format. In this example, we load these masks as NumPy arrays using the Tifffile package. The SageMaker Geospatial 1.0 kernel also includes other widely used libraries like GDAL and Rasterio.

Each pixel in the land cover mask has a value between 0-11. Each value corresponds to a particular class of land cover. Water’s class index is 6. We can use this class index to extract the water mask. First, we count the number of pixels that are marked as water. Next, we multiply that number by the area that each pixel covers to get the surface area of the water. Depending on the bands, the spatial resolution of a Sentinel-2 L2A image is 10m, 20m, or 60m. All bands are downsampled to a spatial resolution of 60 meters for the land cover segmentation model inference. As a result, each pixel in the land cover mask represents a ground area of 3600 m2, or 0.0036 km2.

import os
from glob import glob
import cv2
import numpy as np
import tifffile
import matplotlib.pyplot as plt
from urllib.parse import urlparse
from botocore import UNSIGNED
from botocore.config import Config

# Download land cover masks
mask_dir = "./masks/lake_mead"
os.makedirs(mask_dir, exist_ok=True)
image_paths = []
for s3_object in s3_bucket.objects.filter(Prefix=prefix).all():
    path, filename = os.path.split(s3_object.key)
    if "output" in path:
        mask_name = mask_dir + "/" + filename
        s3_bucket.download_file(s3_object.key, mask_name)
        print("Downloaded mask: " + mask_name)

# Download source images for visualization
for tci_url in tci_urls:
    url_parts = urlparse(tci_url)
    img_id = url_parts.path.split("/")[-2]
    tci_download_path = image_dir + "/" + img_id + "_TCI.tif"
    cogs_bucket = session.resource(
        "s3", config=Config(signature_version=UNSIGNED, region_name="us-west-2")
    ).Bucket(url_parts.hostname.split(".")[0])
    cogs_bucket.download_file(url_parts.path[1:], tci_download_path)
    print("Downloaded image: " + img_id)

print("Downloads complete.")

image_files = glob("images/lake_mead/*.tif")
mask_files = glob("masks/lake_mead/*.tif")
image_files.sort(key=lambda x: x.split("SQA_")[1])
mask_files.sort(key=lambda x: x.split("SQA_")[1])
overlay_dir = "./masks/lake_mead_overlay"
os.makedirs(overlay_dir, exist_ok=True)
lake_areas = []
mask_dates = []

for image_file, mask_file in zip(image_files, mask_files):
    image_id = image_file.split("/")[-1].split("_TCI")[0]
    mask_id = mask_file.split("/")[-1].split(".tif")[0]
    mask_date = mask_id.split("_")[2]
    mask_dates.append(mask_date)
    assert image_id == mask_id
    image = tifffile.imread(image_file)
    image_ds = cv2.resize(image, (1830, 1830), interpolation=cv2.INTER_LINEAR)
    mask = tifffile.imread(mask_file)
    water_mask = np.isin(mask, [6]).astype(np.uint8)  # water has a class index 6
    lake_mask = water_mask[1000:, :1100]
    lake_area = lake_mask.sum() * 60 * 60 / (1000 * 1000)  # calculate the surface area
    lake_areas.append(lake_area)
    contour, _ = cv2.findContours(water_mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    combined = cv2.drawContours(image_ds, contour, -1, (255, 0, 0), 4)
    lake_crop = combined[1000:, :1100]
    cv2.putText(lake_crop, f"{mask_date}", (10,50), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 0), 3, cv2.LINE_AA)
    cv2.putText(lake_crop, f"{lake_area} [sq km]", (10,100), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 0), 3, cv2.LINE_AA)
    overlay_file = overlay_dir + '/' + mask_date + '.png'
    cv2.imwrite(overlay_file, cv2.cvtColor(lake_crop, cv2.COLOR_RGB2BGR))

# Plot water surface area vs. time.
plt.figure(figsize=(20,10))
plt.title('Lake Mead surface area for the 2021.02 - 2022.07 period.', fontsize=20)
plt.xticks(rotation=45)
plt.ylabel('Water surface area [sq km]', fontsize=14)
plt.plot(mask_dates, lake_areas, marker='o')
plt.grid('on')
plt.ylim(240, 320)
for i, v in enumerate(lake_areas):
    plt.text(i, v+2, "%d" %v, ha='center')
plt.show()

We plot the water surface area over time in the following figure. The water surface area clearly decreased between February 2021 and July 2022. In less than 2 years, Lake Mead’s surface area decreased from over 300 km2 to less than 250 km2, an 18% relative change.

import imageio.v2 as imageio
from IPython.display import HTML

frames = []
filenames = glob('./masks/lake_mead_overlay/*.png')
filenames.sort()
for filename in filenames:
    frames.append(imageio.imread(filename))
imageio.mimsave('lake_mead.gif', frames, duration=1)
HTML('<img src="./lake_mead.gif">')

We can also extract the lake’s boundaries and superimpose them over the satellite images to better visualize the changes in lake’s shoreline. As shown in the following animation, the north and southeast shoreline have shrunk over the last 2 years. In some months, the surface area has reduced by more than 20% year over year.

Lake Mead surface area animation

Conclusion

We have witnessed the impact of climate change on Lake Mead’s shrinking shoreline. SageMaker now supports geospatial machine learning (ML), making it easier for data scientists and ML engineers to build, train, and deploy models using geospatial data. In this post, we showed how to acquire data, perform analysis, and visualize the changes with SageMaker geospatial AI/ML services. You can find the code for this post in the amazon-sagemaker-examples GitHub repo. See the Amazon SageMaker geospatial capabilities to learn more.

References

[1] https://climate.nasa.gov/

[2] https://www.nps.gov/lake/learn/nature/overview-of-lake-mead.htm

[3] https://earthobservatory.nasa.gov/images/150111/lake-mead-keeps-dropping


About the Authors

 Xiong Zhou is a Senior Applied Scientist at AWS. He leads the science team for Amazon SageMaker geospatial capabilities. His current area of research includes computer vision and efficient model training. In his spare time, he enjoys running, playing basketball and spending time with his family.

Anirudh Viswanathan is a Sr Product Manager, Technical – External Services with the SageMaker geospatial ML team. He holds a Masters in Robotics from Carnegie Mellon University, an MBA from the Wharton School of Business, and is named inventor on over 40 patents. He enjoys long-distance running, visiting art galleries and Broadway shows.

Trenton Lipscomb is a Principal Engineer and part of the team that added geospatial capabilities to SageMaker. He has been involved in human in the loop solutions, working on the services SageMaker Ground Truth, Augmented AI and Amazon Mechanical Turk.

Xingjian Shi is a Senior Applied Scientist and part of the team that added geospatial capabilities to SageMaker. He is also working on deep learning for Earth science and multimodal AutoML.

Li Erran Li is the applied science manager at humain-in-the-loop services, AWS AI, Amazon. His research interests are 3D deep learning, and vision and language representation learning. Previously he was a senior scientist at Alexa AI, the head of machine learning at Scale AI and the chief scientist at Pony.ai. Before that, he was with the perception team at Uber ATG and the machine learning platform team at Uber working on machine learning for autonomous driving, machine learning systems and strategic initiatives of AI. He started his career at Bell Labs and was adjunct professor at Columbia University. He co-taught tutorials at ICML’17 and ICCV’19, and co-organized several workshops at NeurIPS, ICML, CVPR, ICCV on machine learning for autonomous driving, 3D vision and robotics, machine learning systems and adversarial machine learning. He has a PhD in computer science at Cornell University. He is an ACM Fellow and IEEE Fellow.