How do I revert to a known stable kernel after an update prevents my Amazon EC2 instance from rebooting successfully?

Last updated: 2020-02-05

How do I revert to a stable kernel after an update prevents my Amazon Elastic Compute Cloud (Amazon EC2) instance from rebooting successfully?

Short Description

If you perform a kernel update to your EC2 Linux instance but the kernel is now corrupt, then the instance can't reboot. You can't use SSH to connect to the impaired instance. However, you can create a temporary rescue instance, and then remount your Amazon Elastic Block Store (Amazon EBS) volume on the rescue instance. From the rescue instance, you can configure your GRUB to take the previous kernel for booting.

Important: Don't perform this procedure on an instance store-backed instance. Because the recovery procedure requires a stop and start of your instance, any data on that instance is lost. For more information, see Determining the Root Device Type of Your Instance.

Resolution

Attach the root volume to a rescue EC2 instance

1.    Create an EBS snapshot of the root volume. For more information, see Creating Amazon EBS Snapshots.

2.    Open the Amazon EC2 console.

Note: Be sure that you are in the correct Region.

3.    Choose Instances from the navigation pane, and then choose the impaired instance.

4.    Choose Actions, select Instance State, and then choose Stop.

5.    In the Description tab, under the Root device, choose /dev/sda1, and then choose the EBS ID.

Note: The Root device differs by the AMI, but the /dev/xvda or /dev/sda1 are always reserved for the root device. For example, Amazon Linux 1 and 2 indicate /dev/xvda, while other distributions, such as Ubuntu 14, 16, 18, CentOS 7, and RHEL 7.5, are set as /dev/sda1.

6.    Choose Actions, select Detach Volume, and then choose Yes, Detach. Note the Availability Zone.

7.    Launch a rescue EC2 instance in the same Availability Zone.

Note: Depending on the product code, you might be required to launch an EC2 instance of the same OS type. For example, if the impaired EC2 instance is a paid RHEL AMI, you must launch an AMI with the same product code. For more information, see Getting the Product Code for Your Instance.

8.    After the rescue instance launches, choose Volumes from the navigation pane, and then choose the detached root volume of the impaired instance.

9.    Choose Actions, and then select Attach Volume.

10.    Choose the rescue instance ID (id-xxxxx), and then set an unused device. For our example, we're using /dev/xvdb.

Mount the volume of the impaired instance

1.    Use SSH to connect to the rescue instance.

2.    Run the lsblk command to view your available disk devices:

lsblk

The following is an example of the output:

NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0     0   15G  0 disk
└─xvda1 202:1     0   15G  0 part /
xvdb    202:0     0   15G  0 disk
    └─xvdb1 202:1 0   15G  0 part

Note: Nitro-based instances expose EBS volumes as NVMe block devices. The output generated by the lsblk command on Nitro-based instances shows the disk names as nvme[0-26]n1. For more information, see Amazon EBS and NVMe on Linux instances.

3.    Create a mount directory, and then mount the root partition of the mounted volume to this new directory. In the preceding example, /dev/xvdb1 is the root partition of the mounted volume. For more information, see Making an Amazon EBS Volume Available for Use on Linux.

sudo mkdir /mount
sudo mount /dev/xvdb1 /mount

You can now access the data of the impaired instance through the mount directory.

4.    Mount /dev, /run, /proc, and /sys of the rescue instance to the same paths as the newly mounted volume:

sudo mount -o bind /dev /mount/dev
sudo mount -o bind /run /mount/run
sudo mount -o bind /proc /mount/proc 
sudo mount -o bind /sys /mount/sys

5.    Call the chroot function to change into the mount directory:

sudo chroot /mount

Update the default kernel in the GRUB bootloader

The current corrupt kernel is in position 0 (zero) in the list. The last stable kernel is in position 1. To replace the corrupt kernel with the stable kernel choose one of the following procedures, based on your distro:

GRUB1 (Legacy GRUB) for Red Hat 6 and Amazon Linux

GRUB2 for Ubuntu 14 LTS and 16.04

GRUB2 for RHEL 7.5 and Amazon Linux 2

GRUB2 for RHEL 8 and CentOS 8

GRUB1 (Legacy GRUB) for Red Hat 6 and Amazon Linux 1

Use the sed command to replace the corrupt kernel with the stable kernel in the /boot/grub/grub.conf file:

sudo sed -i '/^default/ s/0/1/' /boot/grub/grub.conf

GRUB2 for Ubuntu 14 LTS and 16.04

1.    Replace the corrupt GRUB_DEFAULT=0 default menu entry with the stable GRUB_DEFAULT=saved value in the /etc/default/grub file:

sed -i 's/GRUB_DEFAULT=0/GRUB_DEFAULT=saved/g' /etc/default/grub

2.    Update grub to recognize the change:

sudo update-grub

3.    Run the grub-set-default command so that the stable kernel loads at the next reboot. In this example, grub-set-default is set to 1 in position 0:

sudo grub-set-default 1

GRUB2 for RHEL 7.5 and Amazon Linux 2

1.    Replace the corrupt GRUB_DEFAULT=0 default menu entry with the stable GRUB_DEFAULT-saved value in the /etc/default/grub file:

sed -i 's/GRUB_DEFAULT=0/GRUB_DEFAULT=saved/g' /etc/default/grub

2.    Update grub to regenerate the /boot/grub2/grub.cfg file:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

3.    Run the grub2-set-default command so that the stable kernel loads at the next reboot. In this example grub2-set-default is set to 1 in position 0:

sudo grub2-set-default 1

4.    Type exit to leave the chroot environment.

GRUB2 for RHEL 8 and CentOS 8

GRUB2 in RHEL 8 and CentOS 8 uses blscfg files and entries in /boot/loader for the boot configuration, as opposed to the previous grub.cfg format. The grubby tool is recommended for managing the blscfg files and retrieving information from the /boot/loader/entries/. If the blscfg files are missing from this location or corrupted, grubby doesn't show any results. You must regenerate the files to recover functionality. Therefore, the indexing of the kernels depends on the .conf files located under /boot/loader/entries and on the kernel versions. Indexing is configured to keep the latest kernel with the lowest index. For information on how to regenerate BLS configuration files, see How can I recover my Red Hat 8 or CentOS 8 instance that is failing to boot due to issues with the Grub2 BLS configuration file?

1.    Run the grubby --default-kernel command to see the current default kernel:

grubby --default-kernel

2.    Run the grubby --info=ALL command to see all available kernels and their indexes:

grubby --info=ALL

The following is example output from the --info=ALL command:

[root@ip-10-10-1-111 ~]# grubby --info=ALL
index=0
kernel="/boot/vmlinuz-4.18.0-147.3.1.el8_1.x86_64"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau crashkernel=auto $tuned_params"
root="UUID=a727b695-0c21-404a-b42b-3075c8deb6ab"
initrd="/boot/initramfs-4.18.0-147.3.1.el8_1.x86_64.img $tuned_initrd"
title="Red Hat Enterprise Linux (4.18.0-147.3.1.el8_1.x86_64) 8.1 (Ootpa)"
id="2bb67fbca2394ed494dc348993fb9b94-4.18.0-147.3.1.el8_1.x86_64"
index=1
kernel="/vmlinuz-0-rescue-2bb67fbca2394ed494dc348993fb9b94"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau crashkernel=auto"
root="UUID=a727b695-0c21-404a-b42b-3075c8deb6ab"
initrd="/initramfs-0-rescue-2bb67fbca2394ed494dc348993fb9b94.img"
title="Red Hat Enterprise Linux (0-rescue-2bb67fbca2394ed494dc348993fb9b94) 8.1 (Ootpa)"
id="2bb67fbca2394ed494dc348993fb9b94-0-rescue"
index=2
kernel="/boot/vmlinuz-4.18.0-80.4.2.el8_0.x86_64"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau crashkernel=auto $tuned_params"
root="UUID=a727b695-0c21-404a-b42b-3075c8deb6ab"
initrd="/boot/initramfs-4.18.0-80.4.2.el8_0.x86_64.img $tuned_initrd"
title="Red Hat Enterprise Linux (4.18.0-80.4.2.el8_0.x86_64) 8.0 (Ootpa)"
id="c74bc11fb3d6436bb2716196dd0e7a47-4.18.0-80.4.2.el8_0.x86_64"

Note the path of the kernel that you want to set as the default for your instance. In the preceding example, the path for the kernel at index 2 is /boot/vmlinuz- 0-4.18.0-80.4.2.el8_1.x86_64.

3.    Run the grubby --set-default command to change the default kernel of the instance:

grubby --set-default=/boot/vmlinuz-0-rescue-4.18.0-80.4.2.el8_1.x86_64

Note: Replace 4.18.0-80.4.2.el8_1.x86_64 with your kernel's version number.

4.    To verify that the preceding command worked, run the grubby --default-kernel command to see the current default kernel:

grubby --default-kernel

5.    Exit from chroot, and unmount /dev, /run, /proc, and /sys:

exit
sudo umount /mount/dev
sudo umount /mount/run
sudo umount /mount/proc
sudo umount /mount/sys
sudo umount /mount

Detach the root volume from the rescue instance and attach it to the impaired instance

1.    From the Amazon EC2 console, choose Instances, and then choose the rescue instance.

2.    Choose Actions, choose Instance State, choose Stop, and then select Yes, Stop.

3.    Detach the root volume vol-xxx from the rescue instance.

4.    Attach the root volume you detached in step 3 to the impaired instance as the root volume (/dev/sda1), and then start the instance.

Note: Depending on the distribution, the root volume varies. For Amazon Linux 1 and 2 the root volume must be /dev/xvda. If it's not, you receive an error. Likewise, if the distro is RHEL, CentOS, or Ubuntu, then the root volume must be /dev/sda1.

The stable kernel now loads and your instance reboots.


Did this article help you?

Anything we could improve?


Need more help?