AWS for Industries

Serverless Remote Sensing analytics for an electric utility customer

Overview

Power and Utility (P&U) companies handle all aspects of electricity provision, including generation, transmission, and distribution. They usually serve many retail customers within a service area that could easily cover millions of residents. To mitigate the risk of wildfires throughout the year, power companies collaborate with various government entities, such as federal, state, tribal, and local governments, as well as fire agencies operating within their service area. This collaboration involves the sharing of monitoring resources and geospatial data. The power company provides access to its Geographic Information System (GIS), wildfire monitoring cameras, and weather stations.

The Remote Sensing department at a P&U company supplies the necessary data for planning and implementing activities aimed at mitigating wildfire risks. This includes imagery data obtained from aerial Light Detection and Ranging (LiDAR) platforms and satellite imagery. These data sources differ in terms of cost, spatial resolution, and collection frequency. LiDAR data offers the most detailed information and highest resolution for vegetation detection and management, but it is also the most expensive and is typically collected every two to three years. On the other hand, satellite data is more cost-effective and can be collected as needed with a lead time of a few days. Recently, some P&U Remote Sensing teams began exploring the use of lower-cost, low-latency imagery to aid in power line inspections after outage events, aiming to enhance operational efficiency. During a Public Safety Power Shutoff (PSPS) event, Power Companies must prioritize the swift restoration of power while making sure that safety requirements are met.

Goals and use case

Historically, a P&U company’s Remote Sensing department used LiDAR technology to determine the spatial relationships between vegetation and their assets, such as substations, power poles, and power lines. Although LiDAR is highly accurate and provides good spatial resolution, it has limited temporal frequency because of the time and cost needed for data collection, which restricts its use to once every three years. This poses a challenge because substantial vegetation growth can occur between data collections, making it difficult to predict the current spatial relationships between the P&U company’s assets and adjacent vegetation.

To address this limitation Amazon Web Services (AWS) Professional Services worked with power companies to develop an integrated Remote Sensing analytics solution that optimizes costs and uses imagery sources that are collected more frequently but at a lower cost. The primary objective was to predict the line of sight (LOS) of power assets (poles and lines) from an aerial perspective. This capability is particularly useful for post-PSPS events when clients need to visually inspect power lines for potential vegetation encroachments or downed lines. Aerial inspections, such as using helicopters or drones, significantly expedite the inspection process, especially in areas that are difficult to access on foot or with ground vehicles. However, the effectiveness of aerial inspections depends on recent vegetation growth not obstructing the LOS.

Therefore, the focus of this project was to combine frequent data acquisition with a computer vision (CV) model that can predict LOS. This approach enhances situational awareness in the field, supports PSPS planning, and complements the client’s field team’s inspection strategy. During a fire event, P&U companies may need to shut off power transmission to specific areas. When the fire event is extinguished, the client aims to restore power as quickly as possible. The Power Company Operations teams are responsible for inspecting all power lines in a High-Risk Fire Zone (HRFZ) for potential hazards. The purpose of this engagement is to assess whether there is a clear LOS from above to the transmission spans. Knowing the LOS status for these spans helps determine the appropriate inspection method, specifically whether airborne drones or rotary aircraft are suitable for inspecting a particular area.

Data preprocessing and modeling strategy

The AWS Professional Services team accessed various sources of aerial imagery and the LiDAR-derived spatial information of vegetation, as shown in the following figure. The overall approach for integrating data, labeling, model training, and inference is as follows. The analysis incorporates the fixed positions of P&U company assets (for example power poles, lines, and substations) available through the company’s internal GIS system. The second data source consists of LiDAR-derived data, which provides precise information about the location and extent of vegetation near the client’s assets. This data is represented as shapefiles that describe the polygons of tree crowns. These two data sources were collected around the same time, thus they can be used to train a machine learning (ML) model to determine if a power line spatially overlaps with a tree crown, indicating obstruction of LOS.

Figure 1 Overall strategy for data integration, labeling, model training, and inference

Figure 1: Overall strategy for data integration, labeling, model training, and inference. The analysis integrates the known, static positions of the P&U company assets such as power poles, lines, and substations, available through its internal GIS system.

The same data ingestion and training approach can be applied to more recent satellite imagery with lower resolution. Two models were developed using the two data sources: one using high-resolution aerial imagery and the other using lower-resolution satellite imagery. In each case, the large-scale imagery was divided into smaller tiles centered on the middle of a power line, halfway between the supporting poles.

To label the data and train the model, three sources of information were integrated. First, the model needs the location of the tree canopy, which is available from the LiDAR data. Second, the model needs the location of the power line, which is obtained from the GIS database. Third, the model needs the overhead imagery, which can be obtained from either aerial or satellite images.

To combine these sources of information in a format suitable for a CV model, the following steps were performed. The location of the power line was overlaid on the aerial imagery, as shown in the following figure. To label the data, it was determined whether the tree polygons in each image intersected with the power line. In this specific use case, images with no more than one intersection between the power line and any tree polygon were labeled as LOS, while images with more than one intersection were labeled as NO-LOS. Following these steps, an image classification model was trained to output whether each line is LOS or NO-LOS. The first step of the project involved formally labeling the data in a format suitable for ingestion by an ML algorithm. The strategy included overlaying the known fixed locations of client assets with vegetation information, such as the extent of tree crowns, as depicted in the following figure.

Example images used for training and inference. Left: A power line overlapping with several tree canopies, labeled as NO-LOS. Right: The power line isn’t intersecting with any tree canopy and is labeled as LOS.

Figure 2: Example images used for training and inference. Left: A power line overlapping with several tree canopies, labeled as NO-LOS. Right: The power line isn’t intersecting with any tree canopy and is labeled as LOS.

Model training and performance

Using the imaging and GIS data from two HRFZs, we extracted and labeled more than 40,000 images centered on power line spans. The images were organized into folders based on whether they were labeled as LOS or NO-LOS, and subsequently uploaded to Amazon S3. Then, these extracted images were used as input for an image classification training job in the Amazon SageMaker Studio environment. Initial experiments within the Studio environment compared different image classification model architectures, such as ResNet and SWIN, revealing that the ResNet-100 architecture provided the best performance. For this architecture, we further created a hyperparameter tuning job to search for optimal model training parameters. The final model selected achieved the following performance metrics: an accuracy of 76.5%, a precision of 68%, and a recall of 30.8%. With appropriate post-processing techniques, the model’s performance on power line segmentation can be further improved, potentially achieving an F1 score exceeding 85%.

Deployment of a serverless, automated, and scalable Batch Inference Solution

The automated inference pipeline based on SageMaker Batch Transform was developed using a trained image asset vegetation collision detection model. During production, the pipeline follows these steps. First, a current .gdb file containing geolocated information about client assets, such as poles and power lines, needs to be present in a specified location in Amazon S3. Second, the user must upload a collection of tiff files with aerial imagery to a specified location in Amazon S3. These TIFF files are expected to contain geolocation information. Uploading the tiff files to Amazon S3 triggers an AWS Lambda function that processes the files by extracting smaller tile images centered on each span of line based on pole/line location from the .gdb file. When the tile images are extracted to Amazon S3, the Lambda function invokes a SageMaker Batch Transform job. This job uses the trained model to perform inference on each tile image. After the job is complete, the Lambda function receives the inference results and performs post-processing steps. The purpose of these steps is to format the inference results in a way that can be used with POWER COMPANY internal GIS tools. Finally, the Lambda sends Amazon Simple Notification Service (Amazon SNS) notifications to the users. The following figure provides an outline of the process. This figure is a high-level architecture, and you can consult with AWS security best practices before implementing and following the AWS shared responsibility model.

Figure 3 Outline of data ingestion, processing, and inference during production.

Figure 3: Outline of data ingestion, processing, and inference during production.

This architecture offers several benefits to the customer. First, it is cost-effective. SageMaker Batch Transform only uses computing resources for the duration needed to perform inference on the tile images. The process of setting up and shutting down computing resources is automated and transparent to the customer. Similarly, the Lambda function exits immediately after completing its task and stops consuming resources. The second benefit is scalability. Although the initial project focused on two specific zones served by the client, the solution can analyze a larger geographical extent without limitations. Regardless of the input size, whether it’s a small region or an area as large as a US state, the system can process the data in approximately 30-50 minutes. This consistent processing time is achieved through massive parallelization, where the workload is distributed across multiple Lambda functions running concurrently. This architecture makes sure that processing time remains relatively constant even as the input size increases, demonstrating excellent horizontal scalability. Lastly, the solution is expandable. If the client purchases new imagery, then only a few well-defined steps are needed to retrain and update the model and pipeline. The remaining parts of the pipeline remain virtually unchanged, and the only change needed by the user is updating the model identifier in an environment variable for the Lambda function.

Visualization of inference results with GIS tools

As mentioned earlier, the model generates predictions and labels for each segment of the power line, indicating whether it is suitable for aerial inspection. The initial analysis provides two key findings. First, it reveals the relative prevalence of LOS and NO-LOS. The following figure illustrates these results for the initial zones of interest.

Figure 4: Examples of spatial distribution of power lines with LOS (blue) and NO-LOS (red) within the HRFZs. Figure 4: Examples of spatial distribution of power lines with LOS (blue) and NO-LOS (red) within the HRFZs.

Second, the analysis identifies contiguous sections of the power line that have LOS, indicating the potential for aerial surveillance assets, such as drones or helicopters, to operate in those areas. The output of the Batch Inference job is a GeoPandas file that contains geolocation information and predictions of LOS or NO-LOS for each line segment. This file can be imported into GIS tools. In this case, ArcGIS was used to visualize the model predictions spatially, enabling POWER COMPANY to make informed decisions and optimize the allocation of aerial assets. The project focused on two high-risk fire zones, namely Zone A and Zone B. The output was further visualized using ArcGIS for these two HRFZs. The ML models indicate that 39% of line miles in Zone A and 50% in Zone B have LOS. This information can be valuable for inspection crews and management when planning the allocation of aerial and ground-based inspection assets.

Conclusions

This post describes a collaboration between the AWS Professional Services and client teams. The teams worked together to develop an ML model that predicts LOS for two HRFZs using various datasets such as LiDAR, satellite, and GIS geospatial data. This provides the P&U company with situational awareness for their PSPS planning and inspection strategy. The solution involved a data processing pipeline to prepare the datasets for training and inference. The SageMaker training and hosting services were used to generate inferences and labels for each segment of the power line, indicating whether it was suitable for inspection through appropriate transportation methods to maximize the technician efficiency and lower inspection cost. In summary, the team delivered an automated, scalable, and cost-effective pipeline that allowed for on-demand operation of the ML models using Lambda functions and SageMaker Batch Inference.

Xin Li

Xin Li

As the Senior Solutions Developer for Renewable Energy Optimization Department based in Houston, TX, Xin is responsible for establishing Amazon's renewable energy assets operation data streaming and analytics services. Xin has more than 15 years of experience in the energy industry. Xin has a keen interest in IoT, data analytics, AI/ML, and renewable energy optimization and he is actively developing new solutions aimed at empowering utilities to derive value using AWS best practices.

Dan Iancu

Dan Iancu

Dan Iancu is a Data Scientist at AWS ProServe, specializing in applying computer vision and IoT technologies to address real-world challenges. His work includes using computer vision to detect invasive species in Hawaiian forests, identify fall hazards in warehouses, and spot wildfires from satellite imagery. By integrating these advanced technologies, Dan effectively tackles practical issues, demonstrating his commitment to delivering impactful solutions. His innovative approach reflects a dedication to leveraging cutting-edge tools to solve pressing environmental and safety concerns.

Rishi Pathak

Rishi Pathak

Rishi Pathak is a Principal Portfolio Manager for Data & AI at AWS, bringing over 20 years of experience in enterprise technology solutions. His areas of focus include data technologies, AI/ML, and generative AI. At AWS, Rishi helps organizations leverage these technologies to achieve their business goals. His role involves shaping solutions in the data and AI space, aligning them with customer needs and industry trends.