AWS Database Blog

How power utilities analyze and detect harmonic issues using power quality and customer usage data with Amazon Timestream

In this two-part series, we demonstrated how to use Amazon Timestream database and its built-in time series functionalities to identify the harmonic issues at scale through correlating metrics for millions of customers and automate the process for large-scale data handling.

An electricity utility normally engages in electricity generation and distribution of electricity to end users, such as Commercial and Industrial (C&I) or residential customers. Many utilities are experiencing an increase in harmonics in their Transmission and Distribution (T&D) systems due to the proliferation of Distributed Energy Resources (DER) such as solar PhotoVoltaic (PV), power electronics loads such as Variable Frequency Drive (VFD), and electric vehicle (EV) battery chargers. This modern electrical equipment uses non-linear loads (loads with current characteristics that do not follow the applied voltage waveform) that produce a harmonic current that flows back into the power systems, the harmonic current causes harmonic voltage distortion, which deteriorates the power quality.

Power quality is very important to both utilities and end users, because any deviation from the expected levels of power quality would cause equipment damage or malfunction such as overloading the hearing, therefore shortening the equipment’s life and reducing efficiency. Poor power quality can also deteriorate the quality of the power deliveries to customers, affect customer equipment performance or even cause safety issues, system shutdown, and data loss. Therefore, understanding power quality issues, taking actions to detect and fix the problems and maintain good quality power is crucial to ensuring safe and efficient electrical systems.


Harmonics have been a power quality concern, and are expected to become a significant power quality concern for utilities as the grid evolves in the future. Typically, most harmonics in the distribution systems are produced by customers’ loads. A unique challenge presented by the harmonics issues in modern electrical distribution systems is that no individual customers are solely responsible for the adverse effects of harmonics in the systems. Instead, it is an aggregation of distributed harmonics sources in the electrical distribution system that collectively results in power quality problems. For the utility to investigate and resolve harmonics issues, a large amount of harmonics data collected from many customer locations is required.

The IEEE 519 standard defines the limits for voltage and current harmonics distortion in electric power systems. The purpose of the IEEE 519 is that the users of the electrical system shall limit their harmonic current emissions to reasonable values below recommended limits, and the utility shall limit voltage distortion level to the same.

The following figure shows an actual utility circuit with geographical identifications removed. In this example, the primary harmonics meter is installed at the triangle location, with 30 meters installed at cross segments of the same feeder. Meter data is collected and transmitted to the data repository every 15 minutes. The goal is to investigate which customers may be injecting excessive harmonics into the systems. In the analysis, harmonics meter Voltage Total Harmonics Distortion (VTHD) and customer consumption (kWh) are used to calculate the correlation.

After the correlation calculations, as shown in the following figure, customer 1’s (highlighted in red) consumption has a correlation score of 0.9, which shows a very strong correlation to feeder harmonic variation. Customer 2’s (highlighted in orange) consumption with a correlation score of 0.5 shows relatively smaller correlation and the consumption is very small. Therefore, customer 2’s loads don’t appear to be a major source of harmonics. Customer 3’s (highlighted in green) consumption with a correlation score of 0.1 shows very little correlation to the feeder VTHD, therefore, customer 3’s loads aren’t causing any harmonic distortion. The closer the primary harmonics meter is to the source of harmonics, the larger harmonics distortion reading we get.

Although the correlation can be calculated in real time every 15 minutes, it makes more sense to study the trends and patterns of the harmonics over a certain period of time (for example, 1 day, 1 week, or 2 weeks). Also, note that correlation is not causation, and the computed correlation is normally validated by power quality engineers in the field using additional tests to confirm causation.


To accurately detect and locate the harmonics sources, it’s important to reliably collect harmonics and demand data from harmonics and customer meters, clean and normalize the data, and identify the correlation between them.

To properly calculate the data series correlation, all data sources must have the same interval; however, it may not be true for time series data for all meters. Therefore, the metering data needs to be cleaned with synchronized timestamps. In addition, any missing data needs to be interpolated and normalized before the correlation can be calculated. A large utility normally has over 100 thousand three-phase commercial and industrial customers, thousands of feeders, large data volume, and increased complexity make the situation even worse.

To achieve these goals, developers need to write code to clean and prepare the data, to debug and maintain these codes, and to set up the right amount of infrastructure. It makes more sense for business users to focus on the business outcome and the data rather than writing codes that are common across the industry.

Solution overview

Amazon Timestream is a purpose-built managed time series database service that makes it easy to store and analyze trillions of events per day. It’s designed specifically to solve time series use cases and has over 250 built-in functions using standard SQL queries, which eases the pain of writing, debugging, and maintaining thousands of lines of code. It saves time and cost in managing the lifecycle of time series data by keeping recent data in memory and moving historical data to a cost-optimized storage tier based on user-defined policies. Timestream also has built-in time series analytics functions, helping you identify data trends and patterns in near-real time.

In this post, we demonstrate how to calculate the correlation coefficient—how the two time series datasets (harmonics distortion data and customer consumption data) trend over time using the built-in correlation function. With these built-in functions, you can use standard SQL queries to perform correlation between two time series datasets.

Sample data

In the following sample calculation, we selected a circuit with one harmonics meter and 10 customer meters. The calculation is based on 30-day period.

The following sample data shows our customer meter data.

meter_id measure_name time meter_measure_value interval_datetime_utc
Customer_Meter_2 kWh 2022-09-30 23:45:00 0.3 2022-10-01 06:45:00
Customer_Meter_5 kWh 2022-09-30 23:45:00 48.0 2022-10-01 06:45:00
Customer_Meter_7 kWh 2022-09-30 23:45:00 67.2 2022-10-01 06:45:00
Customer_Meter_6 kWh 2022-09-30 23:45:00 0.16 2022-10-01 06:45:00
Customer_Meter_8 kWh 2022-09-30 23:45:00 83.16 2022-10-01 06:45:00
Customer_Meter_4 kWh 2022-09-30 23:45:00 0.0 2022-10-01 06:45:00
Customer_Meter_3 kWh 2022-09-30 23:45:00 0.0 2022-10-01 06:45:00
Customer_Meter_10 kWh 2022-09-30 23:45:00 0.04 2022-10-01 06:45:00
Customer_Meter_9 kWh 2022-09-30 23:45:00 102.88 2022-10-01 06:45:00
Customer_Meter_1 kWh 2022-09-30 23:45:00 69.12 2022-10-01 06:45:00

The following sample data shows our harmonic meter data.

harmonic_meter_series_id measure_name time read_timestamp local meter_measure_value
Harmonic_Meter_ONE_VTHD_Phase_A VTHD 2022-10-01 06:45:00 2022-09-30 23:45:00 12.0
Harmonic_Meter_ONE_VTHD_Phase_A VTHD 2022-10-01 06:30:00 2022-09-30 23:30:00 13.0
Harmonic_Meter_ONE_VTHD_Phase_A VTHD 2022-10-01 06:15:00 2022-09-30 23:15:00 13.0
Harmonic_Meter_ONE_VTHD_Phase_A VTHD 2022-10-01 06:00:00 2022-09-30 23:00:00 12.0
Harmonic_Meter_ONE_VTHD_Phase_A VTHD 2022-10-01 05:45:00 2022-09-30 22:45:00 12.0
Harmonic_Meter_ONE_VTHD_Phase_A VTHD 2022-10-01 05:30:00 2022-09-30 22:30:00 12.0
Harmonic_Meter_ONE_VTHD_Phase_A VTHD 2022-10-01 05:15:00 2022-09-30 22:15:00 12.0
Harmonic_Meter_ONE_VTHD_Phase_A VTHD 2022-10-01 05:00:00 2022-09-30 22:00:00 12.0
Harmonic_Meter_ONE_VTHD_Phase_A VTHD 2022-10-01 04:45:00 2022-09-30 21:45:00 12.0
Harmonic_Meter_ONE_VTHD_Phase_A VTHD 2022-10-01 04:30:00 2022-09-30 21:30:00 12.0

We use the following query to interpolate the data and do the correlation calculations:

WITH cud_result AS (
         CREATE_TIME_SERIES(time, meter_measure_value),
        SEQUENCE(min('2022-09-01 08:00:00.00000000'), max('2022-09-30 00:00:00.000000000'), 10m)) AS result 
    FROM "harmonics"."customer-usage-data"  
    WHERE measure_name = 'meter-reading' 
      AND time between '2022-09-01 08:00:00.00000000' and '2022-09-30 00:00:00.00000000'
    GROUP BY  meter_id, measure_name
hmd_result AS (
    SELECT harmonic_meter_series_id, INTERPOLATE_LINEAR(
        CREATE_TIME_SERIES(time, meter_measure_value), 
       SEQUENCE(min('2022-09-01 08:00:00.00000000'), max('2022-09-30 00:00:00.000000000'), 10m)) AS result  
    FROM "harmonics"."Data_HarmonicMeterONE"  
    WHERE measure_name = 'meter-reading' 
  AND time between '2022-09-01 08:00:00.00000000' and '2022-09-30 00:00:00.00000000'
    GROUP BY  harmonic_meter_series_id, measure_name
SELECT cud_result.meter_id, hmd_result.harmonic_meter_series_id, correlate_pearson(cud_result.result, hmd_result.result) AS result 
FROM cud_result, hmd_result
order by 1,2

Meter data Cleanup and Interpolation

As described in previous section, when meter data come in, it could have missing data, which will complicate the calculations. This query first uses linear interpolation to insert any missing meter interval data for both the harmonic meter and regular customer meters. Developers also can use other interpolation algorithms such as cubic spline, last sampled value, or even a constant value.

SELECT harmonic_meter_series_id, INTERPOLATE_LINEAR(
        CREATE_TIME_SERIES(time, meter_measure_value), 
       SEQUENCE(min('2022-09-01 08:00:00.00000000'), max('2022-09-30 00:00:00.000000000'), 10m)) AS result  
    FROM "harmonics"."Data_HarmonicMeterONE"  
    WHERE measure_name = 'meter-reading' 
  AND time between '2022-09-01 08:00:00.00000000' and '2022-09-30 00:00:00.00000000'
    GROUP BY  harmonic_meter_series_id, measure_name

Correlation Calculation

After the data is cleaned up, a Pearson correlation calculation is applied. Based on different business requirements, a Spearman correlation can also be applied.

SELECT cud_result.meter_id, hmd_result.harmonic_meter_series_id, correlate_pearson(cud_result.result, hmd_result.result) AS result 
FROM cud_result, hmd_result order by 1,2

With these simple queries, developers can quickly calculate the harmonic distortion results as shown in the following figure. The result shows for each customer’s meter read and the correlation between the meter read and Phase A, B or C voltage total harmonic distortion. The higher the number, the closer the relationship between them. Electrical engineers can then use these results to help identify the sources of power quality issues.

meter_id harmonic_meter_series_id result
Customer_Meter_1 Harmonic_Meter_ONE_VTHD_Phase_A -0.1581674092708899
Customer_Meter_1 Harmonic_Meter_ONE_VTHD_Phase_B -0.15778482265690358
Customer_Meter_1 Harmonic_Meter_ONE_VTHD_Phase_C -0.15639198811140193
Customer_Meter_10 Harmonic_Meter_ONE_VTHD_Phase_A 0.08876736150469715
Customer_Meter_10 Harmonic_Meter_ONE_VTHD_Phase_B 0.08839688077012807
Customer_Meter_10 Harmonic_Meter_ONE_VTHD_Phase_C 0.07278925591321009
Customer_Meter_11 Harmonic_Meter_ONE_VTHD_Phase_A 0.007891559192882137
Customer_Meter_11 Harmonic_Meter_ONE_VTHD_Phase_B 0.006667484345306506
Customer_Meter_11 Harmonic_Meter_ONE_VTHD_Phase_C -0.0021444035277923047
Customer_Meter_2 Harmonic_Meter_ONE_VTHD_Phase_A 0.2174783625036597

To improve performance and lower costs, engineers can use Amazon Timestream’s Scheduled Query feature to schedule this query to run at regular intervals. With scheduled queries, user defines the real-time analytics queries that compute aggregates, rollups, and other operations on the data, and Amazon Timestream periodically and automatically runs these queries and reliably writes the query results into a separate table. The data is typically calculated and updated into these tables within a few minutes. For more information on Scheduled Query feature, refer to Improve query performance and reduce cost using scheduled queries in Amazon Timestream.


In this post, we demonstrated how to use a Timestream database and its built-in time series functionalities to calculate the correlation between customer energy usage and power quality issues. Amazon Timestream databases provides a solution for utility engineers to identify the actual sources of power quality issues without writing a complex code. In addition, it makes the data preparation and processing much easier, which includes cleaning and processing large volumes of raw time series data, interpolating missing intervals and calculating correlations between different time series data. For more detail about Amazon Timestream’s built-in time series functionality, refer to our documentation

About the Authors

Bin Qiu is a Global Partner Solution Architect focusing on ER&I at AWS. He has more than 20 years’ experience in the energy and power industries, designing, leading and building different smart grid projects, such as distributed energy resources, microgrid, AI/ML implementation for resource optimization, IoT smart sensor application for equipment predictive maintenance, EV car and grid integration, and more. Bin is passionate about helping utilities achieve digital and sustainability transformations.

Sreenath Gotur is leading the Partner Solutions Factory on the Solution Architecture team at AWS, based out of Charlotte, NC. Prior to joining AWS, he was heading enterprise data management, enterprise data services, and data innovation portfolios with a large financial firm. Sreenath has a special interest in data and analytics, document databases, timeseries databases, and graph databases. In his spare time, he enjoys spending quality time with his family.

Glenn Aiemjoy is a Senior Power Quality Engineer at Pacific Gas & Electric Company (PG&E). He supports electric distribution operations and specializes in troubleshooting and investigating power quality issues. Glenn spends most of his time working with customers, engineers, equipment vendors, and the power quality industry on power quality mitigation and solutions. Prior to joining PG&E in 2013, Glenn completed a master’s degree in Wind Energy Engineering from the Technical University of Denmark. He is passionate about renewable energy and sustainability.

Sandeep Kataria is a Data Scientist at PG&E. He specializes in building data pipelines and implementing machine learning algorithms towards companies’ electric distribution asset maintenance, specifically leading to wildfire prevention and safety. Sandeep joined PG&E in 2010 and joined the company’s Enterprise Decision Science team in 2021 while earning a master’s degree in Data Science from the UC Berkeley School of Information. He is passionate about building data-driven tools that enable customer and public safety.