Why is my CloudEndure replication process stuck at 100% with the "Finalizing Initial Sync" message appearing in the console?

Last updated: 2022-12-23

I'm using CloudEndure Migration or CloudEndure Disaster Recovery. The replication process is stuck at 100% with a message stating "Finalizing initial sync" in the CloudEndure console.

Short description

Note: As of December 30, 2022, CloudEndure Migration is no longer available in all commercial AWS Regions. CloudEndure Migration continues to be available in China, AWS GovCloud (US) Regions, and in AWS Outposts. Starting December 31, 2022, customers in commercial Regions can use AWS Application Migration (MGN).

You might see one of two messages in the CloudEndure console if the replication process is stuck at 100% during the final initial sync stage:

  • "Finalizing Initial Sync - Flushing Backlog"
  • "Finalizing Initial Sync - Creating First Launchable Snapshot" This error might occur for the following reasons:

Resolution

Finalizing Initial Sync - Flushing Backlog

Wait until the backlog completes flushing for the sync to initialize.

If the source machine is very write intensive, then the backlog might increase with time. The machine or machines might remain stuck in the Finalizing Initial Sync state on the CloudEndure console. If this occurs, then do the following:

1.    Test the replication speed. After testing the replication speed, calculate the required bandwidth, and then allocate the bandwidth to the source machine.

2.    Verify that you turned off the Network bandwidth throttling option under Replication Settings. If your configuration requires turning on this option, then make sure that you set the value to at least the minimum required bandwidth. For more information, see [Optional] Turn on network bandwidth throttling .

3.    Check the network and disk utilization of the replication server using Amazon CloudWatch metrics. If a resource throttles the server, then use a dedicated replication server or select to use fast SSD data disks instead of HDD disks.

To verify which replication server that a specific source machine uses, run the netstat command on the source machine as shown in the following example. Note the remote IP address that the machine connects to over port 1500.

$ netstat -anp | grep ":1500"

Or, review the agent.log.0 file on the source machine to identify the exact replication server in use:

$ sudo cat /var/lib/cloudendure/agent.log.0 | grep :1500 | tail -n 1

Finalizing Initial Sync - Creating First Launchable Snapshot

To troubleshoot this error message, do the following:

  • Verify that the CloudEndure user's AWS Identity and Access Management (IAM) policy has all permissions needed to run Amazon Elastic Compute Cloud (Amazon EC2) API operations.
  • Confirm that the replication server communicates with Amazon EC2 endpoints within the Region.
  • Identify any network connectivity blockers.
  • Check for recent changes in Replication Settings.
  • Make sure that you're using the correct proxy settings.
  • Confirm that the CloudEndure Agent works properly.
  • Check for service quota issues.
  • Verify that the CloudEndure machine licenses aren't expired.

Verify that the CloudEndure user's IAM policy has all permissions to run the required Amazon EC2 API operations

For a sample policy, see the IAM sample policy. Or, view the AWS CloudTrail Event history to confirm any API failures for the configured CloudEndure IAM user.

Confirm that the replication server communicates with Amazon EC2 endpoints within the Region

1.    Launch a new Linux machine in the same subnet as your staging area.

2.    Log in to the new machine and then run the following commands to test connectivity. In the following example commands, replace us-east-1 with your Region.

$ dig ec2.us-east-1.amazonaws.com
$ telnet ec2.us-east-1.amazonaws.com 443
$ wget https://ec2.us-east-1.amazonaws.com

If any of the preceding commands fail, then network connectivity issues exist. Proceed to the following section.

Identify any network connectivity blockers

Verify that the VPC, subnet, security group, network access control list (ACL), and route table settings align with the Replication Settings. A misalignment might block communication to EC2 endpoints from the replication servers.

If the replication server launches in a public subnet, then do the following:

1.    Verify that the security group, network ACLs, and route table allow communication with EC2 endpoints on TCP port 443.

2.    Verify that the enableDnsHostnames and enableDnsSupport attributes are set to true at the VPC level:

$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsHostnames
$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsSupport

If the replication server launches in a private subnet, then do the following:

1.    Verify that the security group, network ACLs, and route table allow communication with EC2 endpoints on TCP port 443.

2.    For NAT Gateway or NAT instance configurations in the route table, verify that outbound traffic to the EC2 endpoint on TCP port 443 flows correctly.

3.    Some configurations are set up so that outbound traffic passes through a transit gateway or a virtual private gateway. If this is the case, then verify that the route table allows outbound traffic to reach Regional Amazon EC2 endpoints on TCP port 443.

4.    Verify if an internal or external firewall blocks communication.

5.    If the VPC has interface VPC endpoints, then make sure that communication occurs between EC2 endpoints on TCP port 443 through a private network. To do this:

Verify that the security group that's associated with the VPC endpoint allows incoming traffic from the replication instance on TCP port 443.

Verify that the enableDnsHostnames and enableDnsSupport attributes are set to true at the VPC level. Also, verify that the PrivateDnsEnabled value is set to true on the VPC interface endpoints.

$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsHostnames
$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsSupport
$ aws ec2 describe-vpc-endpoints --vpc-endpoint-ids vpce-088d25a4bbf4a7abc

Check for recent changes in Replication Settings

You can track changes to Replication Settings from the CloudEndure Event Log. For example, check if there is a tag inserted in the Staging Area Tags field that's not valid. For a list of allowed characters, see Tag restrictions.

Make sure that you're using the correct proxy settings

1.    For replication servers using a proxy server, verify that the settings on the proxies allow communication with Regional Amazon EC2 endpoints on TCP port 443.

2.    Make sure that the allowed list for SSL interception and authentication includes console.cloudendure.com. For more information, see Defining replication settings for AWS and review the Define the proxy section.

Confirm that the CloudEndure Agent works correctly

Confirm that the CloudEndure Agent works correctly on the source machine. You can check the CloudEndure Agent logs for possible errors to help pinpoint any problems.

Check for Amazon EC2 service quota issues

Service quota issues or API throttling and rate limit issues might prevent CloudEndure from creating the first launchable recovery snapshot. Check the CloudTrail Event history to determine if a service quota or throttling issue exists.

For more information, see Amazon EC2 service quotas.

Verify that the CloudEndure machine licenses are not expired.

Confirm that the CloudEndure License Packages are active. For more information see Expiration dates of License Packages.


Did this article help?


Do you need billing or technical support?