Enable compliance and mitigate IoT risks with automated incident response
Internet of Things (IoT) devices can present unique security challenges ranging from malware, DDoS attacks, and logical or physical compromise. You can prepare for such events by having a process in place to mitigate these risks when they occur. The IoT Lens of the Well-Architected Framework provides high-level guidance on how to be prepared for incidents that impact your IoT devices. In addition, various compliance frameworks such as Payment Card Industry Data Security Standard (PCI DSS), Health Insurance Portability and Accountability Act (HIPAA), and NIST Special Publication 800-53 include requirements to maintain actionable incident response plans for systems.
AWS IoT Device Defender can audit, monitor, and detect potential security incidents. These capabilities help secure IoT application deployments using Amazon Web Services (AWS) IoT Core. However, a complete incident response typically requires properly tracking the incident, coordinating response across multiple teams, and ensuring execution of predefined incident response runbooks. This post provides a working example of preparing for and automating the incident response workflow for AWS IoT-managed devices. It helps to quickly mitigate risks and respond to security events that could arise throughout your IoT infrastructure.
The following solution provides an example of automating your response to incidents involving IoT devices by implementing AWS IoT Device Defender and AWS Systems Manager (AWS SSM) Incident Manager. Use AWS CloudFormation to deploy this automated solution to manage your IoT incident response as code.
IoT Response Automated Workflow
- AWS IoT Device Defender detects a Security Profile violation on an IoT device and sends an Amazon Simple Notification Service (Amazon SNS) alert.
- The alert invokes an AWS Lambda Function to initiate the incident process in Incident Manager, a capability of AWS Systems Manager, using a predefined response plan.
- The Incident Manager response plan starts the incident response workflow using a custom runbook (automation document) for handling IoT incidents.
- A Lambda function is invoked to start containment procedures, which adds affected thing(s) to a Quarantine Thing Group where they can be isolated using AWS IoT Core Policies. The Deployment Steps of this blog post contain instructions on how to create a static Thing Group for Quarantining devices.
- The second step in the runbook notifies the predetermined point(s) of contact of the IoT incident, where a team member can acknowledge the incident and begin mitigation and analysis procedures defined as instructions in the runbook
- An escalation point of contact engages within a configured duration if the incident is not acknowledged.
IoT Incident Response Lifecycle
Preparation is critical for effectively responding to an incident when it happens and enabling faster mitigation. It involves defining the personnel who will respond to an incident, the roles and responsibilities of those involved, ensuring necessary tools are available, enabling logs, and automating repetitive tasks.
The example solution creates an AWS Systems Manager Automation document representing a runbook for IoT-specific responses. A runbook is the documented form of an organization’s procedures for conducting a series of tasks and can involve both manual and automated actions. This document is standardized in YAML and can be modified, updated, and version controlled. It orchestrates automation with human activities in response to an IoT device incident. The runbook in the provided example should be tailored based on your specific requirements and use cases.
Any deviation of a device’s normal security baseline can be considered a security incident. This example uses AWS IoT Device Defender to detect those deviations using preconfigured security profiles that define how a device should behave. The example implements incident response for the following common types of scenarios:
- Unauthorized configurations (rule-based) – A secure IoT device should limit accessible TCP/UDP ports to only those necessary. Any unexpected TCP/UDP services listening on a device indicates a security risk due to compromise or a misconfiguration. A rule-based security profile monitors such events.
- Anomalies in behavior (Machine Learning based) – AWS IoT Device Defender can detect deviations from normal device behavior through machine learning. This capability includes connection attempts, network traffic, and authorization failures. A machine learning-based security profile monitors such events.
Behavior that deviates from a defined security profile in either scenario of this solution will trigger a violation in AWS IoT Device Defender, automatically initiating an incident response plan.
Containment, Analysis and Recovery
For this solution, AWS SSM Incident Manager initiates a response plan using a predefined SSM automation document for IoT security violations. The automation document consists of multiple steps to be taken as a response, which can involve automated and manual actions.
The first step in the example SSM automation document will invoke a Lambda function which performs actions to prepare the device for further investigation and mitigation. In this example solution, the IoT device will automatically be placed on a separate IoT Quarantine group for isolation to isolate and prepare the device for further investigation.
Analysis and Mitigation
After containment, the incident response plan will orchestrate the manual steps of the response, such as notifying appropriate personnel and providing instructions for investigation and resolution. Next, the containment Lambda function engages with the predefined security point(s) of contact. Those contacts will receive and acknowledge a new incident email notification.
Investigating any incident typically involves determining basic answers to who, what, when, where, and why. Detecting compromised data is essential for IoT incident response to confirm data validity and accuracy.
Perform forensic analysis on the device in either online or offline mode.
- Online analysis. AWS IoT SSH access can optionally be enabled through a secure tunnel for a security engineer to access and evaluate the device.
- Offline access. Analysis can be performed using collected logs, data, and messages sent to IoT topics from the device.
The incident response in this solution provides links and other important information under Related Items of the incident when opening the Incident Manager console. This feature enables quick access for responders to the information they need.
Direct links to query logs collected on the IoT devices in Amazon CloudWatch Logs Insights are included.
The recovery strategy for IoT incident response must consider several factors:
- Is the device mission critical? What happens if it becomes completely unavailable?
- Are there redundant devices that mitigate this unavailability?
- Does the device contain sensitive data? What is the risk of keeping it online?
- Is the device currently operating and online? Can the resource be physically accessed?
These factors must be considered based on IoT use case(s) and documented as part of the incident response runbook before an incident occurs.
After resolving any critical incident, a post-incident analysis should document the root cause, update stakeholders, identify the impact, and capture lessons learned. This post analysis can provide feedback for improvement in an organization’s incident response. It will identify opportunities to update the response process.
Upon resolution of an incident, AWS SSM Incident Manager will prompt to create a post-incident analysis with information on the event. Click Create analysis to begin the process.
Deployment Steps for Automated Solution
This section reviews the steps to implement the example solution using AWS CloudFormation.
Setup AWS Systems Manager (SSM) Incident Manager
Suppose this is the first time using SSM Incident Manager in the account you will be deploying this solution. In that case, you must follow these steps to configure the service.
- Open the Incident Manager console
- On the Incident Manager service homepage, select Get prepared.
- Choose General settings.
- Read the onboarding acknowledgment. If you agree to Incident Manager’s terms and conditions, check the I have read and agree to the AWS Systems Manager Incident Manager terms and conditions checkbox. Then select Next.
- Set up the replication using either an AWS Owned or a Customer Managed AWS Key Management Service (AWS KMS) key. All Incident Manager resources are encrypted. To learn more about how your data is encrypted, see Data Protection in Incident Manager. See Using the Incident Manager replication set for more information about your replication set.
- If you want to use the AWS Owned key, choose Use AWS owned key, and then choose Create.
- If you want to use a Customer Managed AWS KMS key, choose Choose a different AWS KMS key (advanced).
- Your current Region appears as the first Region in your replication set. Search for an AWS key in our account. If you have not created a key or need to create a new one, select the Create an AWS KMS key button.
- To add more Regions to your replication set, choose Add Region.
- Select the Create button to create your replication set and contacts. To learn more about replication sets and resiliency, see Resilience in AWS Systems Manager Incident Manager.
Create an AWS Simple Systems Manager (SSM) Contact
- After logging into an AWS account with the appropriate permissions, go to the AWS Systems Manager Incident Manager console
- Select Contacts, and then select Create contact
- Choose the Create Contact button.
- Type the full name of the contact and provide a unique and identifiable alias.
- Define a Contact channel. We recommend having two or more different types of contact channels.
- Choose the type: email, SMS, or voice.
- Enter an identifiable name for the contact channel.
- Provide the contact channel details, such as email
- Define the Engagement Plan
- In the Contact channel name drop down, select one of the contact channels from step e, then add the Engagement time in minutes this contact should be notified after stage start
- Click Add engagement to optionally select any other contact channel from step e, along with the Engagement time
- Click Create to create the contact. The contact channel(s) will need to be activated through confirmation email/SMS/voice to be fully functional.
- Copy the Amazon Resource Name (ARN) of the contact you created for use when launching the SAM application
Create an IoT Thing Group for Quarantined Things
- Go to the AWS IoT console and select Manage > Thing Groups.
- Under Create Thing Group, select Create a static thing group, then click Next.
- Enter the name QUARANTINED for the Thing group name, and leave other options in the default state.
- Select the Create thing group button.
Prerequisites for Launching the CloudFormation Stack
The code in GitHub provides a working example of the solution using AWS Serverless Application Module (SAM). Ensure you have met the following prerequisites to deploy the solution using SAM:
- An AWS Account
- AWS Command Line Interface (AWS CLI) installed and configured. User guide here.
- AWS Serverless Application Model (SAM) installed. Overview and user guide here.
- An Amazon Simple Storage Service (S3) Bucket for storing SAM-generated packaged templates. Overview here.
Launching the CloudFormation Stack
- Initialize the SAM project from the GitHub source repository
sam init --location gh:aws-samples/aws-iot-incident-response-example
- In the file samconfig.toml, modify the ssmEngagementContact field with the ARN of the contact you created in previous step “Create an AWS Simple Systems Manager (SSM) Contact”
- Package the SAM application
sam package \
--template-file template.yaml \
--s3-bucket <S3_BUCKET_NAME> \
- Deploy the SAM application
sam deploy \
--template-file packaged-template.yaml \
--stack-name aws-iot-incident-mgmt \
After launching the product, it can take from 3 to 5 minutes to deploy. When the product is deployed, it creates a new CloudFormation stack with a status of CREATE_COMPLETE as part of the provisioned product in the AWS CloudFormation console.
Integrating IoT Devices with the Automated Incident Response Workflow
This example solution deploys an incident response workflow which, by default, will be invoked when any IoT device violates the preconfigured Device Defender security profiles by the CloudFormation template.
Testing the Automated Incident Response
This example requires IoT devices to be enabled to send device-side metrics to the IoT service. To test the solution using an Amazon EC2 instance:
- Follow the steps in the guide to Create a virtual device with Amazon EC2
- Install the IoT Device Client on the virtual device created in Step 1
- Follow the Quick Start steps in the Device Client installation guide as listed
- During the client setup (when running
setup.sh), ensure you specify
ywhen prompted to Enable Device Defender feature?
- Trigger a security profile violation by opening an authorized port on the instance
- Connect to the EC2 instance using Session Manager
- Install Netcat
sudo yum install nc -y
- Start listening on an unauthorized port:
sudo nc --listen 123
- Validate a rule violation for an unauthorized port has started the incident response process
- Check the AWS IoT console after the AWS IoT Device Defender heartbeat time has elapsed (default is 300 seconds) to verify the “DeviceRuleBaseline” security profile has detected a violation
- Check the Incident Manager console to verify a “Critical IoT Device Incident” has been created
- View the QUARANTINED Thing Group in the console. Under “Things”, verify that this group contains the thing representing the EC2 instance
Incident response is critical to mitigating risks and ensuring compliance with industry standards and regulations. Lack of an effective incident response process can lead to incidents having a longer recovery time and increased risk of compromise to data or system availability. Using AWS IoT Device Defender and AWS Systems Manager Incident Manager can help establish an automated workflow for quickly mitigating IoT incidents and ensuring devices maintain a secure configuration.