How can I troubleshoot common issues that cause my Lightsail instance to be unresponsive?
Last updated: 2021-10-12
My Amazon Lightsail instance is unresponsive. What are some common reasons for this and how do I troubleshoot them?
If your instance is unresponsive, review the status checks instance metrics to determine troubleshooting steps. Amazon Lightsail monitors the health of each instance with two status checks:
System status check: The system status check detects issues with the underlying host that your instance runs on. If the underlying host is unresponsive or unreachable due to network, hardware, or software issues, then this status check fails.
Instance status check: An instance status check failure indicates a problem with the instance due to operating system-level errors. OS-level errors include the following:
- Failure to boot the operating system.
- Failure to mount volumes correctly.
- File system issues.
- Incompatible drivers.
- Kernel panic.
Instance status checks might also fail due to over-utilization of resources. The following are three of the most common reasons that your health check might fail due to over-utilization of resources:
- Your instance might operate in the burstable zone when under heavy load, This can make the instance unresponsive or cause the instance to crash.
- The root device is 100% full and the instance became stuck while booting.
- The processes running on the instance used all its memory, preventing the kernel from running.
View the status check metrics of your instance to determine if the instance failed the system status check or the instance status check.
System status check failure
If the system status check failed, the instance must be migrated to a new, healthy host by stopping and starting the instance. You can manually stop and start the instance to migrate it to a new, healthy host.
Note: A stop and start isn't equivalent to a reboot. A start is required to migrate the instance to healthy hardware.
Warning: Before stopping and starting your instance, be aware that the public IP address of the instance changes on every instance stop and start. If you want a public IP that doesn't change on every stop and start of the instance, then you can attach a static IP address.
Instance status check failure
If the instance status check failed, it might be due to operating system-level issues causing boot errors or over-utilization of the instance's resources. The following are common reasons for instance status check failure:
High CPU usage
View the instance's CPUUtilization metric. Note whether the CPU utilization is above the sustainable zone, meaning your instance is operating in the burstable zone and is under heavy load. If this is the case, use the following options to troubleshoot:
- Reboot your instance to return it to a healthy status.
Note: If your instance CPU requirements are higher than what your current instance plan can offer, then the problem will occur again after a reboot.
- Consider switching to a bigger instance plan that meets your CPU requirements.
When the memory is exhausted, the kernel doesn’t have enough memory to run. When this occurs, other processes are stopped to free memory, making the instance unresponsive. You can try to reboot or stop and start the instance. These procedures reduce memory usage.
Disk full errors
If there's no space left on the device and the file system has reached its capacity, the instance might have entered emergency mode because the root device is full. To resolve this, you can increase your Lightsail plan or bundle to one with a larger volume size.
To upgrade your Lightsail plan to a larger instance, do the following:
3. After upgrading your Lightsail plan, connect to your instance.
4. Run the lsblk command to check the disk layout. Even though the disk space increases, there might be lack of free space preventing the automatic process that increases the partition and file system from running. If this occurs, then free some space and manually increase the partition followed by file system. To do this, run the following commands:
Run the growpart command to grow the size of the root partition or partition 1:
$ sudo growpart /dev/xvda 1
Run the lsblk command to verify that partition 1 is expanded:
Expand the file system. Verify that the file system of your root partition "/" using the following command:
$ lsblk -f
In the following example an EXT2/EXT3/EXT4 file system on partition 1 is expanded:
$ sudo resize2fs /dev/xvda1
In the following example, an XFS-type file system is expanded. In this example, "/" is the volume mount point.
$ sudo xfs_growfs -d /
After expanding the file system, run the df -h command to verify that the OS can see the additional space:
$ df -h
Other OS-level issues
Other issues include boot issues, kernel panic, and network failure. Also, there might be block device errors, software bugs, stuck tasks, or unusual system issues. All of these can result in an unresponsive instance. Try to reboot or stop and start the instance. If a reboot or stop and start doesn't resolve the issue, you might need to migrate the Lightsail server to EC2 for troubleshooting further. This is because troubleshooting options are limited in Lightsail as it is for simpler workloads.