Why is my AWS OpsWorks Stacks instance not starting and stuck in the "start_failed" state?

Last updated: 2021-12-29

One of my Amazon Elastic Compute Cloud (Amazon EC2) instances that's managed by AWS OpsWorks Stacks is stuck in the "start_failed" state. Why won't my OpsWorks Stacks instance enter the "online" state, and how do I troubleshoot the issue?

Short description

An OpsWorks Stacks instance can enter the start_failed state and not start during a setup lifecycle event for a variety of reasons. However, the problem is usually the result of a networking issue.

To troubleshoot the issue, first verify whether your EC2 instance can connect to the OpsWorks Stacks service. If there's no connection, then see the If your EC2 instance can't connect to the OpsWorks Stacks service section of this article. If there is a connection, then see the If your EC2 instance can connect to the OpsWorks Stacks service section of this article.

Note: For instances stuck in the setup_failed state, see Why is my AWS OpsWorks Stacks instance not starting and stuck in the "setup_failed" state?

Resolution

Verify if your EC2 instance can connect to the OpsWorks Stacks service

1.    Log in to your Amazon EC2 instance.

2.    Send a test request to the instance's associated OpsWorks endpoint by running the following netcat (nc) command using the Linux command line interface (CLI):

Important: Replace opsworks.us-east-1.amazonaws.com with the OpsWorks Stacks endpoint that you're using.

nc -vz opsworks.us-east-1.amazonaws.com 443

If your EC2 instance can connect to the OpsWorks Stacks service, then the command output looks similar to the following:

Ncat: Connected to <ipaddress>

Note: If netcat isn't installed on your EC2 instance, then manually install the netcat package on the instance by running the following command:

sudo yum install -y nc

If your EC2 instance can't connect to the OpsWorks Stacks service

If a NAT gateway provides internet access to your EC2 instance

Follow the instructions in Why can't my EC2 instances access the internet using a NAT gateway?

If an internet gateway provides internet access to your EC2 instance

Follow the instructions in Why can't my EC2 instance connect to the internet using an internet gateway?

If a NAT instance provides internet access to your EC2 instance

Open the EC2 console and verify the following:

  • The NAT instance is in the running state.
    Note: If the NAT instance isn't in the running state, change the state to running.
  • The NAT instance is passing health checks.
    Note: If the NAT instance isn't passing health checks, do the following: Create a new NAT instance. Then, associate the new NAT instance with your EC2 instance in the NAT instance's route table.
  • The EC2 instance is in a default Amazon Virtual Private Cloud (Amazon VPC).
    Note: An OpsWorks Stacks-managed EC2 instance always enters the start_failed state if it's launched outside of a default Amazon VPC.

For more information, see View status checks.

If a VPC endpoint provides internet access to your EC2 instance

  • Verify that your VPC endpoints are correct and reachable within the Amazon VPC that you're using by doing the following:
    Open the Amazon VPC console.
    In the navigation pane, under Virtual Private Cloud, choose Endpoints.
    Then, review your VPC endpoints associated with your EC2 instance to make sure that they're correct and reachable.
  • Verify that your VPC endpoints can complete the required actions from Amazon Simple Storage Service (Amazon S3).

If your EC2 instance can connect to the OpsWorks Stacks service

Check the EC2 instance's IAM permissions

Verify that an AWS Identity and Access Management (IAM) role for the instance profile exists and includes all of the required permissions.

If there isn't an IAM role for the instance profile, do the following:

1.    Stop the instance.

2.    Detach the instance profile role from the EC2 instance by doing the following:
In the EC2 console, choose Instances. Then, select your EC2 instance.
Choose the Actions tab, choose Security, and then choose Modify IAM role.
Choose No IAM Role. Then, choose Save.

3.    Replace the instance profile using the existing EC2 instance.
-or-
Replace the EC2 instance in OpsWorks Stacks.

Note: For more information on how to replace an EC2 instance in OpsWorks Stacks, see Adding an instance to a Layer.

For EC2 instances backed by Amazon Elastic Block Store (Amazon EBS), verify that the instance's root device volume isn't full

For instructions, see either View free disk space for Linux or View free disk space for Windows.

Verify that the EC2 instance uses IMDSv1

To check what metadata service your instance uses and to reconfigure the instance if needed, see Configure the instance metadata options.

Note: OpsWorks Stacks supports Instance Metadata Service Version 1 (IMDSv1) only, not IMDSv2.

For EC2 instances backed by a custom AMI, verify that the AMI is configured correctly

For more information, see Create a custom Linux Amazon Machine Image (AMI) from an AWS OpsWorks Stacks instance.

Verify that the OpsWorks Stacks agent installed on the EC2 instance is running

1.    Log in to your Amazon EC2 instance.

2.    Verify that the OpsWorks Stacks agent installed on your EC2 instance is running, by running the following command on the Linux CLI:

sudo service opsworks-agent status

If the OpsWorks Stacks agent is running, then the command output looks similar to the following:

Active: active (running)

If the OpsWorks Stacks agent isn't running, then the command output looks similar to the following:

Active: inactive (dead)

If the OpsWorks Stacks agent isn't running, then start the agent by running the following command:

sudo service opsworks-agent start

Check your EC2 instance's CloudTrail logs for "Client.UnauthorizedOperation" errors

Review your OpsWorks Stacks instance's API calls logged in AWS CloudTrail. Look for EC2 RunInstances events that returned the following error message: Client.UnauthorizedOperation.

If the error message appears in your instance's CloudTrail, then see the following article: How do I troubleshoot the encoded authorization failure message when I try to restore an Amazon EC2 instance using AWS Backup?

If your EC2 instance is assigned to more than one layer, then make sure that each layer has the same network settings

For more information, see Adding an instance to a layer.


Did this article help?


Do you need billing or technical support?