Why is my CloudEndure replication process stuck at 100% with the "Finalizing Initial Sync" message appearing in the console?

Last updated: 2020-12-07

I'm using CloudEndure Migration or CloudEndure Disaster Recovery. The replication process is stuck at 100% with a message stating "Finalizing initial sync" in the CloudEndure console. How do I troubleshoot this?

Short description

There are two messages you might see in the CloudEndure console if the replication process is stuck at 100% during the final initial sync stage:

  • "Finalizing Initial Sync - Flushing Backlog"
  • "Finalizing Initial Sync - Creating First Launchable Snapshot" This error might occur for the following reasons:

Resolution

Finalizing Initial Sync - Flushing Backlog

Wait until the backlog completes flushing for the sync to initialize.

If the source machine is very write intensive, the backlog can increase with time. The machine or machines might remain stuck in the Finalizing Initial Sync state on the CloudEndure console. If this occurs, do the following:

1.    Test the replication speed. After testing the replication speed, calculate the required bandwidth and then allocate the bandwidth to the source machine.

2.    Verify that you disabled the Network bandwidth throttling option under Replication Settings. If your configuration requires enabling this option, make sure that you set the value to at least the minimum required bandwidth. For more information, see [Optional] Enable network bandwidth throttling.

3.    Check the network/disk utilization of the replication server using Amazon CloudWatch metrics. If a resource throttles the server, use a dedicated replication server or select to use fast SSD data disks instead of HDD disks.

To verify which replication server a specific source machine uses, run the netstat command on the source machine as shown in the following example. Make a note of the remote IP the machine connects to over port 1500.

$ netstat -anp | grep ":1500"

Or, review the agent.log.0 file on the source machine to identify the exact replication server in use:

$ sudo cat /var/lib/cloudendure/agent.log.0 | grep :1500 | tail -n 1

Finalizing Initial Sync - Creating First Launchable Snapshot

To troubleshoot this error message, do the following:

  • Verify that the CloudEndure user's AWS Identity and Access Management (IAM) policy has all permissions to run the required Amazon Elastic Compute Cloud (Amazon EC2) APIs.
  • Confirm that the replication server communicates with Amazon EC2 endpoints within the Region.
  • Identify any network connectivity blockers.
  • Check for recent changes in Replication Settings.
  • Make sure that you're using the correct proxy settings.
  • Confirm that the CloudEndure Agent works properly.
  • Check for service quota issues.

Verify that the CloudEndure user's IAM policy has all permissions to run the required Amazon EC2 APIs

For a sample policy, see the IAM sample policy. Or, view the AWS CloudTrail Event history to confirm any API failures for the configured CloudEndure IAM user.

Confirm that the replication server communicates with Amazon EC2 endpoints within the Region

1.    Launch a new Linux machine in the same subnet as your staging area.

2.    Log in to the new machine and then run the following commands to test connectivity. In the following example commands, replace us-east-1 with your Region.

$ dig ec2.us-east-1.amazonaws.com
$ telnet ec2.us-east-1.amazonaws.com 443
$ wget https://ec2.us-east-1.amazonaws.com

If any of above commands fail, network connectivity issues exist. Proceed to the following section.

Identify any network connectivity blockers

Verify that the VPC, subnet, security group, network access control list (ACL), and route table settings align with the Replication Settings. A misalignment might block communication to Amazon EC2 endpoints from the replication servers.

If the replication server launches in a public subnet, do the following:

1.    Verify that the security group, network ACLs, and route table allow communication with Amazon EC2 endpoints on TCP port 443.

2.    Verify that the enableDnsHostnames and enableDnsSupport attributes are set to true at the VPC level:

$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsHostnames
$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsSupport

If the replication server launches in a private subnet, do the following:

1.    Verify that the security group, network ACLs, and route table allow communication with Amazon EC2 endpoints on TCP port 443.

2.    If you configured a NAT Gateway or NAT instance in the route table, verify that outbound traffic to the Amazon EC2 endpoint on TCP port 443 flows correctly.

3.    If you configured outbound traffic to pass through a transit gateway or a virtual private gateway, verify that the route table allows outbound traffic to reach regional Amazon EC2 endpoints on TCP port 443.

4.    Verify if an internal or external firewall blocks communication.

5.    If the VPC has interface VPC endpoints, make sure that communication occurs between Amazon EC2 endpoints on TCP port 443 through a private network. To do this:

Verify that the security group associated with the VPC endpoint allows incoming traffic from the replication instance on TCP port 443.

Verify that the enableDnsHostnames and enableDnsSupport attributes are set to true at the VPC level. Also, verify that the PrivateDnsEnabled value is set to true on the VPC interface endpoints.

$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsHostnames
$ aws ec2 describe-vpc-attribute --vpc-id vpc-a01106c2 --attribute enableDnsSupport
$ aws ec2 describe-vpc-endpoints --vpc-endpoint-ids vpce-088d25a4bbf4a7abc

Check for recent changes in Replication Settings

You can track changes to Replication Settings from the CloudEndure Event Log. For example, check if there is an invalid tag inserted in the Staging Area Tags field. For a list of allowed characters, see Tag restrictions.

Make sure that you're using the correct proxy settings

1.    If your replication servers use a proxy server, make sure that the settings on the proxies allow communication with regional Amazon EC2 endpoints on TCP port 443.

2.    Make sure that the allowed list for SSL interception and authentication includes console.cloudendure.com. For more information, see the Define the proxy section in Defining replication settings for AWS.

Confirm that the CloudEndure Agent works correctly

Confirm that the CloudEndure Agent works correctly on the source machine. You can check the CloudEndure Agent logs for possible errors to help pinpoint any problems.

Check for Amazon EC2 service quota issues

Service quota issues or API throttling and rate limit issues might prevent CloudEndure from creating the first launchable recovery snapshot. Check the CloudTrail Event history to determine if a service quota or throttling issue exists.

For more information, see Amazon EC2 service quotas.


Did this article help?


Do you need billing or technical support?