Bushfire mitigation through Machine Learning with AusNet and AWS
Eastern Australia is among the most fire-prone regions in the world. Although bushfires are a regular occurrence in Australia, the 2019–2020 bushfire crisis set ablaze over 17 million hectares of land (larger than the size of England), costing the Australian economy more than $100 billion between property, infrastructure, social, and environmental costs.
With increasingly extreme weather events, bushfire risk in Australia isn’t going away anytime soon. This means the responsibility on Australia’s energy network operators to maintain a safe and reliable supply has never been greater.
The Australian energy network includes over 880,000 kilometers of distribution and transmission lines (approximately 22 trips around the Earth’s circumference) and 7 million power poles. Extreme climate conditions and vegetation growth close to power lines have to be carefully managed to mitigate bushfire risk.
In this post, we discuss how AusNet uses machine learning (ML) and Amazon SageMaker to help mitigate bushfires.
AusNet innovation with LiDAR
AusNet manages 54,000 kilometers of power lines and brings energy to more than 1.5 million Victorian homes and businesses. 62% of this network is located in high bushfire risk areas. AusNet has developed an innovative solution to safely maintain its energy network and minimize the risk of vegetation causing damage to the network.
Since 2009, AusNet has been capturing high-quality LiDAR data across the network using both aerial and road-based mapping systems. LiDAR is a remote-sensing method that uses light in the form of a pulsed laser to measure distances and directions. A sensed point of an object has 3D coordinate information (x, y, z) as well as additional attributes such as density, number of returns, return number, GPS timestamp, and so on. Those points are represented as a 3D point cloud, which is a collection of all the point information. Upon processing, the LiDAR is turned into a 3D model of AusNet’s network assets, identifying the vegetation growth that needs to be trimmed for bushfire safety.
The previous process for LiDAR classification used business rule-driven inference, with a heavy reliance on accurate Geographic Information System (GIS) asset locations to drive automation. Manual labor effort using custom-built labeling tools was required to correctly label LiDAR points where asset locations were inaccurate or simply didn’t exist. The manual correction and classification of LiDAR points increased processing turnaround times and made it difficult to scale.
AusNet and Amazon Machine Learning
AusNet’s Geospatial team partnered with the Amazon ML specialists, including the Amazon Machine Learning Solutions Lab and Professional Services, to investigate how ML could automate LiDAR point classification and accelerate the onerous process of manually correcting inaccurate GIS location data.
The annual cost of accurately classifying trillions of captured LiDAR points that represent the different network configurations around Australia exceeded $700,000 per year and inhibited AusNet’s ability to expand this to larger areas of the network.
AusNet and AWS teamed up to use Amazon SageMaker to experiment with, and build deep learning models to automate the point-wise classification of this large collection of LiDAR data. Amazon SageMaker is a fully managed service that helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning models quickly. The AusNet and AWS team successfully built a semantic segmentation model that accurately classified 3D point cloud data into the following categories: conductor, building, pole, vegetation, and others.
Outcomes for AusNet and bushfire mitigation
The collaboration between AWS and AusNet was a huge success, producing the following outcomes for both the business and bushfire risk reduction:
- Increased worker safety by using LiDAR data and reducing the need for engineers, surveyors, and designers to travel to sites
- Resulted in 80.53% accuracy across all five segmentation categories, saving AusNet an estimated AUD $500,000 per year through automated classification
- Provided 91.66% and 92% accuracy in detecting conductors and vegetation, respectively, improving automatic classification of the two most important segment classes
- Provided the flexibility to utilize LiDAR data obtained from drones, helicopters, planes, and ground-based vehicles, while accounting for each data source’s unique variability
- Enabled the business to innovate faster and scale analytics across their entire network by reducing the dependency on GIS reference data and manual correction processes
- Provided the ability to scale analytics across their entire energy network with increased ML automation and reduced dependency on manual GIS correction processes
The following table depicts the performance of the semantic segmentation model on unseen data (measured using “precision” and “recall” metrics, with higher being better), across the five categories.
ML model classified points from a helicopter capture:
The ML Solutions Lab team brought in a team of highly experienced ML scientists and architects to help drive innovation and experimentation. With cutting-edge ML experience across industries, the team collaborated with AusNet’s Geospatial team to solve some of the most challenging technology problems for the business. Based on the deep ML capabilities of SageMaker, AusNet and AWS were able to complete the pilot in just 8 weeks.
The breadth and depth of SageMaker played a key role in allowing the developers and data scientists from both AusNet and AWS to collaborate on the project. The team utilized code and notebook-sharing features and easily accessed on-demand ML compute resources for training. The elasticity of SageMaker enabled the team to iterate quickly. The team was also able to take advantage of the availability of different hardware configurations to experiment on AWS without needing to invest in upfront capital to acquire on-premises hardware. This allowed AusNet to easily pick the right-sized ML resources and scale their experiments on demand. The flexibility and availability on GPU resources are critical, especially when the ML task requires cutting-edge experiments.
We used SageMaker notebook instances for exploring the data and developing preprocessing code, and used SageMaker processing and training jobs for large-scale workloads. The team also used hyperparameter optimization (HPO) to quickly iterate on multiple training jobs with various configurations and dataset versions to fine-tune the hyperparameters and find the best performing model. For example, we created different versions of datasets using down sampling and augmentation methods to overcome data imbalance issues. Running multiple training jobs with different datasets in parallel allows you to find the right dataset quickly. With large and imbalanced point cloud datasets, SageMaker provided the ability to iterate quickly using many configurations of experiments and data transformations.
ML engineers could conduct initial explorations of data and algorithms using low-cost notebook instances, then offload heavy data operations to the more powerful processing instances. Per-second billing and automatic lifecycle management make sure that the more expensive training instances are started and stopped automatically and only remain active for as long as necessary, which increases utilization efficiency.
The team was able to train a model at a rate of 10.8 minutes per epoch on 17.2 GiB of uncompressed data across 1,571 files totaling approximately 616 million points. For inference, the team was able to process 33.6 GiB of uncompressed data across 15 files totaling 1.2 billion points in 22.1 hours. This translates to inferencing an average of 15,760 points per second including amortized startup time.
Solving the semantic segmentation problem
ML model classified points from a fixed wing capture:
ML model classified points from a mobile capture:
The problem of assigning every point in a point cloud to a category from a set of categories is called a semantic segmentation problem. AusNet’s 3D point clouds from LiDAR datasets consist of millions of points. Accurately and efficiently labeling every point in a 3D point cloud involves tackling two challenges:
- Imbalanced data – Class imbalance is a common problem in real-world point clouds. As seen in the preceding clips, the majority of the points consist of vegetation, with significantly fewer points composed of power lines or conductors making up less than 1% out of the total points. Models trained using the imbalanced dataset are easily biased toward the major classes, and work poorly on the minors. This class imbalance is a common issue in LiDAR point cloud data for outdoor environments. For this task, it’s critical to have good performance in classifying conductor points. Training a model that works well on both the major and minor class is the largest challenge.
- Large scale point cloud – The amount of point cloud data from the LiDAR sensor can cover a large open area. In AusNet’s case, the number of points per point cloud can range from hundreds of thousands to tens of millions, with each point cloud file varying from hundreds of megabytes up to gigabytes. Most of the point cloud segmentation ML algorithms require sampling because the operators can’t take all the points as their input. Unfortunately, many of the sampling methods are computationally heavy, which makes both training and inference slow. In this work, we need to choose the most efficient ML algorithm that works on large-scale point clouds.
The AWS and AusNet teams invented a novel downsampling strategy via clustering points to solve the heavily imbalanced classes issue. This downsampling strategy together with existing mitigations, such as class weighting, helped solve the challenges in training an accurate model with an imbalanced dataset and also enhanced the inference performance. We also experimented with an upsampling strategy by duplicating the minor classes and placing them in different locations. This process was built as a SageMaker Processing job so that it could be applied to the newly acquired dataset for further model training within an MLOps pipeline.
The teams researched various point cloud segmentation models considering accuracy, scalability in term of the number of points, and efficiency. Throughout multiple experiments, we chose a state-of-the-art ML algorithm for a point cloud semantic segmentation, which met the requirements. We also adopted augmentation methods so that the model could learn from various datasets.
To roll out the point cloud segmentation solution, the team designed an ML pipeline using SageMaker for training and inference. The following diagram illustrates the overall production architecture.
The training pipeline features a custom processing container in SageMaker Processing to perform point cloud format conversion, category remapping, upsampling, downsampling, and splitting of the dataset. The training job takes advantage of the multi-GPU instances in SageMaker with higher memory capacity to support training the model with a larger batch size.
AusNet’s LiDAR classification workflow begins with the ingestion of up to terabytes of point cloud data from land and aerial surveillance vehicles into Amazon Simple Storage Service (Amazon S3). The data is then processed and passed into an inference pipeline for point cloud classification. To support this, a SageMaker Transform is used to run batch inference across the dataset, with the output being classified point cloud files with confidence scores. The output is then processed by AusNet’s classification engine, which analyzes the confidence score and generates an asset management report.
One of the key aspects of the architecture is that it provides AusNet with a scalable and modular approach to experiment with new datasets, data processing techniques, and models. With this approach, AusNet can adapt their solution to changing environmental conditions and adopt future point cloud segmentation algorithms.
Conclusion and next steps with AusNet
In this post, we discussed how AusNet’s Geospatial team partnered with Amazon ML scientists to automate LiDAR point classification by completely removing dependency on the GIS location data from the classification task. Hence, the delay occurred by manual GIS correction is removed to make the classification task faster and scalable.
“Being able to quickly and accurately label our aerial survey data is a critical part of minimizing the risk of bushfires. Working with the Amazon Machine Learning Solutions Lab, we were able to create a model that achieved 80.53% mean accuracy in data labeling. We expect to be able to reduce our manual labeling efforts by up to 80% with the new solution,” says Daniel Pendlebury, Product Manager at AusNet.
AusNet envisions ML classification models playing a significant role in driving efficiencies across their network operations. By expanding their automatic classification libraries with new segmentation models, AusNet can utilize vast datasets more productively to ensure the safe, reliable supply of energy to communities throughout Victoria.
The authors would like to thank Sergiy Redko, Claire Burrows, William Manahan, Sahil Deshpande, Ross King, and Damian Bisignano of AusNet for their involvement in the project and bringing their domain expertise on LiDAR datasets and ML training using different ML algorithms.
Amazon ML Solutions Lab
Amazon ML Solutions Lab pairs your team with ML experts to help you identify and implement your organization’s highest-value ML opportunities. If you’d like help with accelerating your use of ML in your products and processes, please contact the Amazon ML Solutions Lab.
About the Authors
Daniel Pendlebury is a Product Manager at AusNet Services specializing in the provision of innovative, automated compliance products to utilities in the Vegetation Management and Asset Maintenance areas.
Nathanael Weldon is a geospatial software developer at Ausnet Services. He specializes in building and tuning large-scale geospatial data processing systems, with experience across the utilities, resources and environmental sectors.
Simon Johnston is an AI leader and is responsible for the Amazon Web Services AI/ML business across Australia and New Zealand, specializing in AI strategy and economics. 20+ years research, management and consulting experience (US, EU, APAC) covering a range of innovative, industry-led research and commercialization AI ventures – engaging across start-ups / SMEs / large corps, and the wider ecosystem.
Derrick Choo is a Solutions Architect at Amazon Web Services. He is based in Melbourne, Australia and works closely with enterprise customers to accelerate their journey in the cloud. He is passionate in helping customers create value through innovation and building scalable applications and has a particular interest in AI and ML.
Muhyun Kim is a data scientist at Amazon Machine Learning Solutions Lab. He solves customer’s various business problems by applying machine learning and deep learning, and also helps them gets skilled.
Sujoy Roy is a scientist with the Amazon Machine Learning Solutions Lab with 20+ years of academic and industry experience building and deploying ML based solutions for business problems. He has applied machine learning to solve customer problems in industries like telco, media and entertainment, AdTech, remote sensing, retail and manufacturing.
Jiyang Kang is a Senior Deep Learning Architect at Amazon ML Solutions Lab, where he helps AWS customers across multiple industries with AI and cloud adoption. Prior to joining the Amazon ML Solutions Lab, he worked as a Solutions Architect for one of AWS’ most advanced enterprise customers, designing various global scale cloud workloads on AWS. He previously worked as a software developer and system architect for companies such as Samsung Electronics in industries such as semiconductors, networking, and telecommunications.
Eden Duthie is the lead of the Reinforcement Learning Professional Services team at AWS. Eden is passionate about developing decision making solutions for customers. He is especially interested in helping industrial customers with a strong focus on supply chain optimization.