My EC2 Linux instance is no longer responding or has boot issues. How do I use EC2Rescue for Linux to troubleshoot operating system-level issues?

Last updated: 2020-04-15

I can't connect to my Amazon Elastic Compute Cloud (Amazon EC2) Linux instance or I'm experiencing boot issues. To correct these problems, I need to fix common issues such as OpenSSH file permissions or gather system (OS) logs for analysis and troubleshooting. How can I use EC2Rescue for Linux to do this?

Short Description

EC2Rescue for Linux is a tool that helps diagnose and troubleshoot problems on Amazon EC2 Linux instances. EC2Rescue for Linux is run on your Amazon EC2 Linux instance to correct operating system-level issues. EC2Rescue for Linux also collects advanced logs, system utilization reports, and configuration files for further analysis.

Common scenarios addressed by EC2Rescue for Linux:

  • Collect system utilization reports such as vmstat, iostat, mpstat, and so on.
  • Collect logs and details such as syslog, dmesg, application error logs, and SSM logs.
  • Detect system problems such as asymmetric routing or duplicate root device labels.
  • Automatically remediate system problems such as correcting OpenSSH file permissions or disabling known problematic kernel parameters.

System Requirements

EC2Rescue for Linux requires an Amazon EC2 Linux instance that meets the following prerequisites:

Supported operating systems

  • Amazon Linux 2
  • Amazon Linux 2016.09+
  • SLES 12+
  • RHEL 7+
  • Ubuntu 16.04+

Software requirements

  • Python 2.7.9+ or 3.2+

Resolution

To troubleshoot an unreachable Amazon EC2 Linux instance with the help of a rescue instance:

1.    Launch a new Amazon EC2 instance in your virtual private cloud (VPC) using the same Amazon Machine Image (AMI) and in the same Availability Zone as the impaired instance. The new instance becomes your "rescue" instance.

Or, you can use an existing instance that you can access, if it uses the same AMI and is in the same Availability Zone as your impaired instance.

2.    Detach the Amazon Elastic Block Store (Amazon EBS) root volume (/dev/xvda or /dev/sda1) from your impaired instance.

3.    Attach the EBS volume as a secondary device ( /dev/sdf) to the rescue instance.

4.    Connect to your rescue instance using SSH.

5.    Create a mount point directory (/rescue) for the new volume attached to the rescue instance in step 3.

$ sudo mkdir /rescue

6.    Mount the volume at the directory you created in step 5.

$ sudo mount /dev/xvdf1 /rescue

Note: The device (/dev/xvdf1) might be attached to the rescue instance with a different device name. Use the lsblk command to view your available disk devices along with their mount points to determine the correct device names.

Note: If the volume mount fails, check dmesg | tail. If the logs suggest conflicting UUID, use the option -o nouuid.

7.    Change the root directory (chroot) to the newly mounted volume.

  $ sudo -i
  # for i in proc sys dev run; do mount --bind /$i /rescue/$i ; done
  # chroot /rescue

8.    Download and install the EC2Rescue Tool for Linux on an offline Linux root volume.

$ curl -O https://s3.amazonaws.com/ec2rescuelinux/ec2rl.tgz
$ tar -xvf ec2rl.tgz

9.    Verify the installation by listing the help file.

$ cd ec2rl-<version_number>
$ ./ec2rl help

10.    Run EC2Rescue for Linux with no options to run all modules as sudo.

$ sudo ./ec2rl run

11.    View the results in /var/temp/ec2rl.

cat /var/tmp/ec2rl/<logfile_location>/Main.log

12.    Enable remediation for the supported modules based on the results.

$ ./ec2rl run --remediate 

13.    After remediation is complete, exit from chroot and unmount the secondary device.

$ exit
$ sudo umount /rescue

Note: If the unmount operation isn't successful, you might have to stop or reboot the rescue instance to enable a clean unmount.

14.    Detach the secondary volume (/dev/sdf) from the rescue EC2 instance, and then attach it to the original instance as /dev/xvda (root volume).

15.    Start the EC2 instance, and then verify that the instance is responsive.

Note: You can also use an AWS Systems Manager Automation document to troubleshoot connection issues. For more information, see Walkthrough: Run the EC2Rescue tool on unreachable instances. The AWSSupport-ExecuteEC2Rescue document is designed to perform a combination of Systems Manager actions, AWS CloudFormation actions, and AWS Lambda functions that automate the steps normally required to use EC2Rescue for Linux.

Additional troubleshooting


Did this article help you?

Anything we could improve?


Need more help?