I received a notice stating that Amazon EC2 detected degradation of the underlying hardware hosting my EC2 instance. What do I need to do?

Last updated: 2020-10-08

I received a notice stating that there is degradation of the underlying hardware hosting my Amazon Elastic Compute Cloud (Amazon EC2) instance. What do I need to do?

Short description

If a hardware malfunction occurs, then Amazon EC2 tags the specific hardware as faulty. Any instances that are running on the hypervisor of the faulty hardware are moved to healthy hardware. During the transition to new hardware, the Amazon EBS-backed instances are stopped and instance store-backed instances are terminated. Amazon EC2 sends a notification through email and to your Personal Health Dashboard informing you of the hardware degradation and of the upcoming instance stop or termination.

Note: Instances launched from an Amazon EC2 Auto Scaling group might terminate as soon as the instance’s system status check fails. If this occurs, the Amazon EC2 Auto Scaling group launches a new replacement instance for the terminated instance. By the time you receive a hardware degradation notification from Amazon EC2, you can't see the original instance in your dashboard. You can review the AWS CloudTrail log file for that instance to view the termination. For more information on how to check an instance from CloudTrail logs, see How do I search CloudTrail logs for API calls to run, stop, start, and terminate EC2 instances?

Resolution

If you receive a hardware degraded notice, you can manually stop and start the instance using the Amazon EC2 console or AWS Command Line Interface (AWS CLI). The stop removes the instance from the faulty hardware. Starting the instance launches it on healthy hardware.

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.

Stop and start the instance

Note: A stop and start isn't equivalent to a reboot. A start is required to migrate the instance to healthy hardware.

Important:

  • This procedure requires a stop and start of your EC2 instance. Instance store data is lost when an instance is stopped and then started again. If your instance is instance store-backed or has instance store volumes containing data, the data is lost when the instance is stopped. For more information, see Determining the root device type of your instance.
  • If your instance is part of an Amazon EC2 Auto Scaling group or if your instance is launched by services that use AWS Auto Scaling, such as Amazon EMR, AWS CloudFormation, AWS Elastic Beanstalk, and so on, then stopping the instance could terminate the instance. Instance termination in this scenario depends on the instance scale-in protection settings for your Auto Scaling group. If your instance is part of an Auto Scaling group, temporarily remove the instance from the Auto Scaling group before starting the resolution steps.
  • Stopping and starting the instance changes the public IP address of your instance. It's a best practice to use an Elastic IP address instead of a public IP address when routing external traffic to your instance.

To stop and start the instance, complete the following steps:

  1. Open the Amazon EC2 console and then select the instance.
  2. Select Actions, Instance State, Stop.
  3. Select Yes, Stop.
    Note: If your instance is stuck in the stopping state, you might want to force the instance to stop. For more information on stopping an instance stuck in the stopping state, see Troubleshooting stopping your instance.
  4. Select the instance again.
  5. Select Actions, Instance State, Start.
  6. Select Yes, Start.

Note: The hardware degradation notification remains in your Personal Health Dashboard with a status of Completed until the stop or terminate date listed in the notification.

(Optional) Set up instance recovery for your instances

You can create an Amazon CloudWatch alarm that automatically recovers instances experiencing underlying hardware degradation. For information on how to set up the CloudWatch alarm, see Recover your instance.


Did this article help?


Do you need billing or technical support?