My EC2 Linux instance is no longer responding or has boot issues. How do I use EC2Rescue for Linux to troubleshoot operating system-level issues?
Last updated: 2020-04-15
I can't connect to my Amazon Elastic Compute Cloud (Amazon EC2) Linux instance or I'm experiencing boot issues. To correct these problems, I need to fix common issues such as OpenSSH file permissions or gather system (OS) logs for analysis and troubleshooting. How can I use EC2Rescue for Linux to do this?
EC2Rescue for Linux is a tool that helps diagnose and troubleshoot problems on Amazon EC2 Linux instances. EC2Rescue for Linux is run on your Amazon EC2 Linux instance to correct operating system-level issues. EC2Rescue for Linux also collects advanced logs, system utilization reports, and configuration files for further analysis.
Common scenarios addressed by EC2Rescue for Linux:
- Collect system utilization reports such as vmstat, iostat, mpstat, and so on.
- Collect logs and details such as syslog, dmesg, application error logs, and SSM logs.
- Detect system problems such as asymmetric routing or duplicate root device labels.
- Automatically remediate system problems such as correcting OpenSSH file permissions or disabling known problematic kernel parameters.
EC2Rescue for Linux requires an Amazon EC2 Linux instance that meets the following prerequisites:
Supported operating systems
- Amazon Linux 2
- Amazon Linux 2016.09+
- SLES 12+
- RHEL 7+
- Ubuntu 16.04+
- Python 2.7.9+ or 3.2+
To troubleshoot an unreachable Amazon EC2 Linux instance with the help of a rescue instance:
1. Launch a new Amazon EC2 instance in your virtual private cloud (VPC) using the same Amazon Machine Image (AMI) and in the same Availability Zone as the impaired instance. The new instance becomes your "rescue" instance.
Or, you can use an existing instance that you can access, if it uses the same AMI and is in the same Availability Zone as your impaired instance.
2. Detach the Amazon Elastic Block Store (Amazon EBS) root volume (/dev/xvda or /dev/sda1) from your impaired instance.
3. Attach the EBS volume as a secondary device ( /dev/sdf) to the rescue instance.
5. Create a mount point directory (/rescue) for the new volume attached to the rescue instance in step 3.
$ sudo mkdir /rescue
6. Mount the volume at the directory you created in step 5.
$ sudo mount /dev/xvdf1 /rescue
Note: The device (/dev/xvdf1) might be attached to the rescue instance with a different device name. Use the lsblk command to view your available disk devices along with their mount points to determine the correct device names.
Note: If the volume mount fails, check dmesg | tail. If the logs suggest conflicting UUID, use the option -o nouuid.
7. Change the root directory (chroot) to the newly mounted volume.
$ sudo -i # for i in proc sys dev run; do mount --bind /$i /rescue/$i ; done # chroot /rescue
8. Download and install the EC2Rescue Tool for Linux on an offline Linux root volume.
$ curl -O https://s3.amazonaws.com/ec2rescuelinux/ec2rl.tgz $ tar -xvf ec2rl.tgz
9. Verify the installation by listing the help file.
$ cd ec2rl-<version_number> $ ./ec2rl help
10. Run EC2Rescue for Linux with no options to run all modules as sudo.
$ sudo ./ec2rl run
11. View the results in /var/temp/ec2rl.
12. Enable remediation for the supported modules based on the results.
$ ./ec2rl run --remediate
13. After remediation is complete, exit from chroot and unmount the secondary device.
$ exit $ sudo umount /rescue
Note: If the unmount operation isn't successful, you might have to stop or reboot the rescue instance to enable a clean unmount.
14. Detach the secondary volume (/dev/sdf) from the rescue EC2 instance, and then attach it to the original instance as /dev/xvda (root volume).
15. Start the EC2 instance, and then verify that the instance is responsive.
Note: You can also use an AWS Systems Manager Automation document to troubleshoot connection issues. For more information, see Walkthrough: Run the EC2Rescue tool on unreachable instances. The AWSSupport-ExecuteEC2Rescue document is designed to perform a combination of Systems Manager actions, AWS CloudFormation actions, and AWS Lambda functions that automate the steps normally required to use EC2Rescue for Linux.
- For general instructions on recovering a Linux instance, see Instance Recovery When a Host Computer Fails. For Windows instances, see Troubleshoot an Unreachable Instance.
- If your instance's root device is an Amazon EBS-backed volume, try to stop and start the instance. For more information, see Stop and start your instance.
- For instance-store backed instances, if you created a custom AMI of the instance, you might be able to restore your instance using the AMI as a backup. For instructions on creating a new instance from an AMI you own, see Launching your instance from an AMI.
- In some cases, your EBS volume might have I/O access disabled, which can render your instance inaccessible. For instructions on how to identify and troubleshoot this, see Working with the Auto-Enabled IO Volume Attribute.
- If you have lost the SSH key pair, you can reset it using Systems Manager Automation and AWSSupport-ResetAccess document.