AWS IoT Device Defender Announces ML Detect GA
Today, AWS announced the general availability of AWS IoT Device Defender Machine Learning Detect and Mitigation (ML Detect), a new feature that automatically detects IoT device-level operational and security anomalies based on learnings from past device data. Customers can already use AWS IoT Device Defender’s Rules Detect feature to manually set static alarms. ML Detect makes this easier by automatically setting your fleet’s expected behavior so that you don’t need an in-depth understanding of how your devices behave across a range of metrics to get started like messages sent, disconnect frequency, and bytes in/out. Also, ML Detect automatically updates the expected behavior based on new data trends caused by seasonality and other changing factors.
This post presents an overview of the feature, highlights how some of our customers are benefitting from device-level monitoring and operational reliability provided by the feature, and walks you through steps to get started.
Overview of AWS IoT Device Defender ML Detect
ML Detect uses machine learning to set thresholds for the expected behavior of your IoT device. The feature makes it easier to use AWS IoT Device Defender Detect, as you no longer need a comprehensive understanding of how your device should behave (such as disconnect frequency, number of messages sent, etc.) before configuring the feature. When an anomaly is identified, you can respond by choosing a built-in mitigation action, like quarantining a device.
AWS IoT Device Defender ML Detect includes the following capabilities:
- Supports six cloud-side metrics and seven device-side metrics for near-real-time continuous monitoring and applies machine-learning algorithms to inference if there is an anomaly in metric datapoints.
- Supports confidence level HIGH/MEDIUM/LOW in ML alarm notifications.
- During the initial ML training period, the feature will aggregate a minimum of 25,000 datapoints per metric for 14 days across your devices, and at the initial model creation, will begin identifying device behavior anomalies.
- After the initial model is created, the feature retrains the model each day with a minimum of 25,000 datapoints per metric to refresh the expected device behaviors based on the latest trailing 14 days
- Supports built-in mitigation actions so you can address device issues.
- Uses the same alarm mechanism as AWS IoT Device Defender Rules Detect, which includes Amazon SNS notification integration.
Customers already benefiting from ML Detect
By using AWS IoT Device Defender ML Detect, our customers are providing more automated monitoring, troubleshooting and support to their end customers. The feature helps them meet their customer service commitments and improves the overall reliability of their system.
ERA Home Security: meeting customer service commitments with automated device monitoring and mitigation
ERA Home Security is a UK-based home security hardware and solution company that has been in business since 1838. They recently migrated their home-grown IoT infrastructure onto AWS IoT and started using AWS IoT Device Defender ML Detect to monitor their device fleet for connectivity issues and anomalous behaviors, such as tamper events. Since implementing AWS IoT Device Defender ML Detect in September 2020, ERA Home Security was able to improve their customer experience by meeting over 99.9% of their customer service commitments. If devices went offline, the ERA team was notified immediately and able to resolve issues quickly based on the alarm details. “The results are very promising and I can see us scaling up our usage of ML Detect in our production environment. Giving our highly technical staff the freedom to innovate on our customers’ behalf and leave the heavy lifting of devices mitigation automation to Device Defender ML Detect,” Jey Jeyasingam, Chief Technology Officer of ERA Home Security shared with AWS.
Jane: proactive device monitoring for better protection and care of customers
Jane is a smart living technology company based in Belgium that provides seniors, as well as senior living communities, in-home health monitoring solutions. These solutions connect seniors with their care providers via dashboards and alerts for proactive and reactive wellness updates. As Jane worked to bring their solutions to senior living communities, their monitoring devices experienced poor connectivity in local 3G networks. They frequently received reports of disconnect issues, almost exclusively from customers. Since deploying AWS IoT Device Defender ML Detect, Jane has been able to see which device is dropping connection unexpectedly and provide more proactive support and troubleshooting for their customers. These improvements are critical as Jane expands their business in the B2B market. “Through AWS IoT Device Defender ML Detect, we can proactively monitor and protect our customers that contributes to the reliability of our system. This way our care-givers and seniors can sleep soundly,” Evert Van Cauwenberg, Chief Technology Officer of “Jane” commented to AWS.
Getting started with ML Detect in the AWS IoT Console
You can use the AWS IoT Console or the AWS CLI to create a ML-based Security Profile, which will give you the option to monitor standard metrics on your entire device fleet or selected device groups. The following steps detail how to get started with ML Detect in the AWS IoT Console:
Step 1 Enable ML Detect on your device group(s)
To start using ML Detect, you first create a Security Profile that uses machine learning to learn expected device behaviors by automatically creating models based on historical device metric data. The Security Profile can be assigned to a group of devices or all the devices in your fleet.
1. Set basic configurations
Open the AWS IoT Core. In the navigation pane, choose Detect, Security Profiles, Create Security Profile, Create ML anomaly Detect profile.
- From the Target drop down, select device group(s) or all of your fleet devices for monitoring.
- For Security Profile name, enter a value.
- For Description (optional), enter a value that describes your security profile.
- For Select metric behaviors in Security Profile, clear any device metrics selected that you don’t want to monitor
- (Optional) Set up SNS notification so you can receive alarm notifications via email, text or any incident response system you send alarm notifications to.
- For Topic, you have the option to create a new SNS topic or using existing one from drop down.
- For Role, select predefined role. If you don’t have any predefined roles, you can go to IAM to set up Role (see IAM role further instructions.)
Note: Cloud-side metrics will be collected from device connection and messaging logs from AWS IoT without requiring device-side implementation. Device-side metrics will require device-side implementation (e.g. AWS IoT Device Client or AWS IoT Device Defender sample agent integration with your device firmware).
2. Edit metric behaviors
Click ‘Next’ to go to “Edit metric behaviors”. This action allows you to customize the default metric behavior settings provided in previous step.
3. Review configuration
After editing metric behaviors, click ‘Next’ and reach “Review configuration”. Confirm the configuration of the metric behaviors in your ML Security Profile or return to the previous steps to make any edits.
Once you click Create ML Security Profile, your ML model(s) will start building.
Step 2: Behaviors and ML training
Now that we have the models created, let’s check in on their status.
In the AWS IoT Console, go to Defend > Device > Security Profiles and select your Profile, select Behaviors and ML training tab. You will be presented with status of the ML model training. You can see behavior names at the top, such as Messages_sent_ML_behavior, ML as Threshold type and ‘Pending build’ as Model status.
Once the model status becomes ‘Active’, it will start inferencing anomalies in your device data and update itself based on new data patterns across your devices.
Step 3: Review your ML Detect alarms
After the initial ML models are built (it usually takes 14 days with sufficient training data), they are ready for data evaluations, and thereafter, you can view Detect alarms inferred by the models on an ongoing basis.
1. Stay in AWS IoT console, let’s go to Defend > Detect > Alarms and review. You will be presented with two tabs: one showing Active alarms and one tabbed as History.
As displayed in the example Alarms tab below, we can see thing (device) ‘ddml1,’ which is attached to the ML_Detect_profile Security Profile, has made connection attempts and incurred failed authorizations.
2. Click over to the History tab, you can see all the alarm events that occurred over the past 24 hours (you can select additional options from dropdown to display up to 30 days).
The green line represents alarms cleared and red indicates devices still in alarm. Hovering over the lines and dots, you can see the date, time, and status of the alarms during this timestamp, an example shown below.
3. Scroll down the same page, you can also view additional details about these past alarms and their state.
Here you can see thing name (device) ‘ddml12’ received too many messages and subsequently cleared them. This action then cleared the alarm state for ‘ddml12’.
4. Dive deeper by clicking on the thing name. Here, we select ‘ddml12’ and see the graph in the Defender metrics section (see below).
It shows a spike in messages received, which triggered an alarm. However, the alarm cleared thereafter.
Step 4: Fine-tune your ML Detect alarms
Once your models are in Active state, you can update your Security Profile ML behavior settings to try out different configurations. Three confidence levels are available : High, Medium and Low.
- High confidence means low sensitivity in anomalous behavior evaluation and lower number of alarms.
- Medium is medium sensitivity for anomalous behavior and medium number of alarms.
- Low confidence means high sensitivity and higher number of alarms.
1. To adjust your notifications, stay in AWS IoT console, go to Defend > Detect > Security Profiles and select the profile radio button of the security profile you want to modify and review. Then click Actions > Edit.
2. On the Set basic configurations page, you can select and de-select more options for metrics as well as attach your device to other devices or groups of devices.
3. Click Next to go to the edit metric behaviors screen.
For example, we alter Authorization failure datapoints required to trigger alarm from the default of 1 to 3 and modify the confidence to Low. With these parameters, we will see a device is allowed to attempt authorization 3 times before it raises an alarm and will also set the notification confidence level to Low.
Similarly, you can alter other settings as you see fit for your devices, their alarms, datapoints required to clear alarm, and suppressed notifications.
4. Once complete, click Update ML Security Profile – these new settings will be saved. You can access these changes by clicking on the profile and viewing the updates in the Behaviors and ML training.
Step 5: Mitigate identified device issues
1. Create a quarantine thing group
Before we set up quarantine mitigation actions, let’s create a quarantine group where we will move the device in alarm. You can also use any existing group if you have one.
Go to Manage > Thing groups > create Thing Group and give your group a suitable name. We will name ours ‘Quarantine_group.’ Then, click Create thing group button at the bottom of the page. Under Thing group, Security, apply the following policy to the thing group.
2. Create a mitigation action
Once we have the group created, we can set up a mitigation action that moves devices in alarm into ‘Quarantine_group.’
Go to Defend > Mitigation Actions > Click Create
Let’s create the following setup:
- Action name: Give the Action a name, such as Quarantine_action.
- Action type: We will select ‘Add things to thing group (Audit or Detect mitigation).’
- Action execution role: choose Create Role or select an existing role if you have created one earlier.
- Parameters: select Thing groups. We used Quarantine_group, which we created earlier.
Once all the sections are completed, save this action.
3. Mitigate Detect alarms
Let’s go to Defender > Detect > Alarms
- Under the ‘Active’ tab, we can see multiple devices are in the alarm state and are filling the Published alarms page with too many notifications. Let’s select a device and apply a mitigation action.
- Select the device which you want to move to quarantine and click ‘Start Mitigation Actions.’
- Once you click the ‘Start Mitigations Actions’ button, you will be presented with pre-created mitigation actions. Let’s use the one we created earlier.
Now that the device is isolated in ‘Quaratine_group’, we can further investigate the root cause of the issue. Once the investigation is completed, we can move the device out of quarantine or take further actions.
In addition, you can set up everything we’ve walked through above using the AWS CLI if you want to automate the AWS IoT Device Defender ML Detect workflows in your DevOps pipeline. For more instructions on how to do so, please refer to the Detect commands documentation or sample code on Github.
With AWS IoT Device Defender ML Detect, you can create Security Profiles that include ML models of expected device behaviors built on historical device data automatically, and assign these profiles to a group of devices or all the devices in your fleet. AWS IoT Device Defender then identifies deviations and triggers alarms using the ML models, making it easy for you to monitor device anomalies and take mitigating actions.
About the authors
Syed Rehan is a Sr. Specialist Solutions Architect at Amazon Web Services and is based in London. He is covering global span of customers and supporting them as lead IoT Solution Architect. Syed has in-depth knowledge of IoT and cloud and works in this role with global customers ranging from start-up to enterprises to enable them to build IoT solutions with the AWS eco system.
Cathy Lu is a Sr. Product Manager at Amazon Web Services and is based in Seattle. She oversees the AWS IoT Device Defender service on product strategy, roadmap planning, business analysis and insights, customer engagement, and other product management areas. Cathy led the launch of several fast-growing security products in her career.