Real-time anomaly detection for Amazon Connect call quality using Amazon OpenSearch
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.
If your contact center is serving calls over the internet, network metrics like packet loss, jitter, and round-trip time are key to understanding call quality. In the post Easily monitor call quality with Amazon Connect, we introduced a solution that captures real-time metrics from the Amazon Connect softphone, stores them in Amazon OpenSearch Service, and creates easily understandable dashboards using Kibana. Examining these metrics and implementing rule-based alerting can be valuable. However, as your contact center scales, outliers becomes harder to detect across broad aggregations. For example, average packet loss for each of your sites may be below a rule threshold, but individual agent issues might go undetected.
The high cardinality anomaly detection feature of Amazon ES is a machine learning (ML) approach that can solve this problem. You can streamline your operational workflows by detecting anomalies for individual agents across multiple metrics. Alerts allow you to proactively reach out to the agent to help them resolve issues as they’re detected or use the historical data in Amazon ES to understand the issue. In this post, we give a four-step guide to detecting an individual agent’s call quality anomalies.
Anomaly detection in four stages
If you have an Amazon Connect instance, you can deploy the solution and follow along with this post. There are four steps to getting started with using anomaly detection to proactively monitor your data:
- Create an anomaly detector.
- Observe the results.
- Configure alerts.
- Tune the detector.
Creating an anomaly detector
A detector is the component used to track anomalies in a data source in real time. Features are the metrics the detector monitors in that data.
Our solution creates a call quality metric detector when deployed. This detector finds potential call quality issues by monitoring network metrics for agents’ calls, with features for round trip time, packet loss, and jitter. We use high cardinality anomaly detection to monitor anomalies across these features and identify agents who are having issues. Getting these granular results means that you can effectively isolate trends or individual issues in your call center.
We can observe active detectors from Kibana’s anomaly detection dashboard. The graphs in Kibana show live anomalies, historical anomalies, and your feature data. You can determine the severity of an anomaly by the anomaly grade and the confidence of the detector. The following screenshot shows a heat map of anomalies detected, including the anomaly grade, confidence level, and the agents impacted.
We can choose any of the tiles in the heat map to drill down further and use the feature breakdown tab to see more details. For example, the following screenshot shows the feature breakdown of one the tiles from the agent john-doe. The feature breakdown shows us that the round trip time, jitter, and packet loss values spiked during this time period.
To see if this is an intermittent issue for this agent, we have two approaches. First, we can check the anomaly occurrences for the agent to see if there is a trend. In the following screenshot, John Doe had 12 anomalies over the last day, starting around 3:25 PM. This is a clear sign to investigate.
Alternatively, referencing the heat map, we can look to see if there is a visual trend. Our preceding heat map shows a block of anomalies around 9:40 PM on November 23—this is a reason to look at other broader issues impacting these agents. For instance, they might all be at the same site, which suggests we should look at the site’s network health.
We can also configure each detector with alerts to send a message to Amazon Chime or Slack. Alternatively, you can send alerts to an Amazon Simple Notification Service (Amazon SNS) topic for emails or SMS. Alerting an operations engineer or operations team allows you to investigate these issues in real time. For instructions on configuring these alerts, see Alerting for Amazon OpenSearch Service.
Tuning the detector
It’s simple to start monitoring new features. You can add them to your model with a few clicks. To get the best results from your data, you can tailor the time window to your business. These time intervals are referred to as window size. The window size effects how quickly the algorithm adjusts to your data. Tuning the window size to match your data allows you to improve detector performance. For more information, see Anomaly Detection for Amazon OpenSearch Service.
In this post, we’ve shown the value of high cardinality anomaly detection for Amazon ES. To get actionable insights and preconfigured anomaly detection for Amazon Connect metrics, you can deploy the call quality monitoring solution.
The high cardinality anomaly detection feature is available on all Amazon OpenSearch domains running OpenSearch 7.9 or greater. For more information, see Anomaly Detection for Amazon OpenSearch Service.
About the Authors
Kun Qian is a Specialist Solutions Architect on the Amazon Connect Customer Solutions Acceleration team. He solving complex problems with technology. In his spare time he loves hiking, experimenting in the kitchen and meeting new people.
Rob Taylor is a Solutions Architect helping customers in Australia designing and building solutions on AWS. He has a background in theoretical computer science and modeling. Originally from the US, in his spare time he enjoys exploring Australia and spending time with his family.