I am using AWS OpsWorks to manage AWS resources for my application and to monitor the health of application layer instances. AWS OpsWorks provides auto healing functionality, which restarts unhealthy or failed application layer instances. Under certain circumstances, AWS OpsWorks may also start or restart instances that pass health checks and have not failed.

AWS OpsWorks may unexpectedly start or restart instances under the following conditions:

  • Auto healing is enabled for an OpsWorks managed instance and the instance is stopped or terminated using the AWS EC2 console, API, or CLI.
  • OpsWorks is unable to communicate with the OpsWorks agent running on a managed instance for approximately 5 minutes, after which OpsWorks considers the instance to have failed.

Follow these steps to help prevent OpsWorks from unexpectedly starting/restarting managed instances:

  1. If auto healing is enabled for an OpsWorks managed instance, use the OpsWorks console, API, or CLI to manage the instance. Management tasks performed on an OpsWorks instance will be undone if the state of the instance is inconsistent with the OpsWorks instance settings. For more information about managing OpsWorks instances with the OpsWorks console, see Manually Starting, Stopping, and Rebooting 24/7 Instances and Deleting AWS OpsWorks Instances. For more information about how auto healing works, see Using Auto Healing to Replace Failed Instances. Auto healing is enabled by default for all layers, but you can edit the layer's general settings to disable it.
  2. Check the opsworks-agent.keep_alive.log file on the instance. There should be successive “Reporting keepalive” entries in this file, written at 1 minute intervals. Also inspect the opsworks-agent.statistics.log file to evaluate the general health of the instance at the time of the auto heal or reboot. Note that these log files are retained only if the root device for the instance is Amazon EBS–backed because instance store–backed root devices are ephemeral. If there are 5+ minute gaps for the entries in the opsworks-agent.keep_alive.log file, you may need to troubleshoot potential issues with networking, firewalls, and network name resolution. For high traffic/volume applications, consider implementing High Availability for Amazon VPC NAT Instances to avoid the limitation of NAT being a single point of failure for your application.

AWS OpsWorks, auto healing, instance, health, reboot, restart, start, unexpected

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center.