Why is my image build pipeline failing with the error "Step timed out while step is verifying the Systems Manager Agent availability on the target instance(s)" in Image Builder?

Last updated: 2022-06-22

My image build times out in EC2 Image Builder. Then, it returns the error "failure message = 'Step timed out while step is verifying the SSM Agent availability on the target instance(s)'". How do I resolve this?

Short description

EC2 Image Builder uses AWS Systems Manager Automation to build custom images. Make sure that the Amazon Elastic Compute Cloud (Amazon EC2) instance that's used to build images and run tests has access to the AWS Systems Manager service.

The error message, failure message = 'Step timed out while step is verifying the SSM Agent availability on the target instance(s)', can occur due to the following reasons:

  • The build or test instance can't access Systems Manager endpoints. To resolve this issue, check the inbound and outbound rules for your security group and network access control list (network ACL).
  • The instance profile doesn't have the required permissions. To resolve this issue, verify that the instance profile has the correct policies attached.
  • The instance can't reach Instance Metadata Service (IMDS). To resolve this issue, verify that the instance can reach IMDS.
  • AWS Systems Manager Agent (SSM Agent) isn't installed on the base image.

Resolution

Check the outbound and inbound rules for your security group and network ACL

If your build or test instance can't access Systems Manager endpoints, check the following:

  • Your security group has outbound open for port 443.
  • Your network ACL has inbound open for ephemeral ports (1024–65535) and outbound open for port 443.

For public subnet builds:

  • The subnet must have Enable auto-assign public IPv4 address enabled.
  • The route table must have an internet gateway attached.

For private subnet builds:

  • The route table must have either a NAT gateway or instance, or AWS PrivateLink endpoints to Systems Manager (ssm, ssmmessages, ec2messages) and Image Builder. If logging is enabled, then the route table must also have endpoints to Amazon Simple Storage Service (Amazon S3) or Amazon CloudWatch.
  • The security group for the Amazon Virtual Private Cloud (Amazon VPC) endpoint must allow inbound traffic on port 443 to the VPC CIDR.

Verify that the instance profile has the correct policies

The instance profile is the AWS Identity and Access Management (IAM) role that's defined in the infrastructure configuration. If it doesn't have the required permissions, then the build fails. The instance profile must have the following managed policies attached to have permission to build images:

  • EC2InstanceProfileForImageBuilder
  • EC2InstanceProfileForImageBuilderECRContainerBuilds (for Docker images)
  • AmazonSSMManagedInstanceCore

You can also create custom policies that have similar permissions to the preceding managed policies.

Note: Check the role's trust policy to make sure that ec2.amazonaws.com is allowed to assume the role.

Verify that the instance can reach IMDS

IMDS is used to access metadata from a running instance. If your instance can't reach IMDS, then the build fails. Make sure that your operating system's (OS) firewall allows traffic 169.254.169.254 on port 80 so that the instance can reach IMDS.

Run the following command to test connectivity:

$ telnet 169.254.169.254 80

If you're using a proxy, then configure SSM Agent to work with a proxy. For Linux, see Configuring SSM Agent to use a proxy (Linux). For Microsoft Windows, see Configure SSM Agent to use a proxy for Windows Server instances.

Verify that SSM Agent is installed on the base image

If the base image doesn't have SSM Agent preinstalled, then install it through the user data in the build phase. SSM Agent must be installed to add custom user data to the image recipe.

To check if SSM Agent is preinstalled on the base image, launch an Amazon EC2 instance using the base image. Then, run the following command for the OS of your instance:

Amazon Linux

$ sudo systemctl status amazon-ssm-agent

Amazon Linux 2

$ sudo systemctl status amazon-ssm-agent

macOS

$ Check for an agent log file at /var/log/amazon/ssm/amazon-ssm-agent.log

SUSE Linux Enterprise Server

$ sudo systemctl status amazon-ssm-agent

Ubuntu Server 16.04 (32-bit)

$ sudo status amazon-ssm-agent

Ubuntu Server 16.04 64-bit instances (deb)

$ sudo systemctl status amazon-ssm-agent

Ubuntu Server 16.04, 18.04, and 20.04 LTS & and 20.10 STR 64-bit (Snap)

$ sudo systemctl status snap.amazon-ssm-agent.amazon-ssm-agent.service

Windows Server

Run in PowerShell:

Get-Service AmazonSSMAgent

Disable "Terminate instance on failure"

If the preceding resolutions don't resolve the issue, then:

1.    Open the EC2 Image Builder console.

2.    Disable Terminate instance on failure under Infrastructure configuration, and run the pipeline again.

3.    Connect to the instance and run the following commands to verify the connection to Systems Manager endpoints:

Linux instance

$ curl -v https://ssm.region.amazonaws.com
$ curl -v https://ec2messages.region.amazonaws.com
$ curl -v https://ssmmessages.region.amazonaws.com

Windows instance

Test-NetConnection ssm.region.amazonaws.com -port 443
Test-NetConnection ec2messages.region.amazonaws.com -port 443
Test-NetConnection ssmmessages.region.amazonaws.com -port 443

Note: Replace region with your AWS Region.

4.    Use the following paths to check SSM logs for any failures or errors:

Linux

  • /var/log/amazon/ssm/amazon-ssm-agent.log
  • /var/log/amazon/ssm/errors.log

Windows

  • %PROGRAMDATA%\Amazon\SSM\Logs\amazon-ssm-agent.log
  • %PROGRAMDATA%\Amazon\SSM\Logs\errors.log

Did this article help?


Do you need billing or technical support?