How do I revert to a known stable kernel after an update prevents my Amazon EC2 instance from rebooting successfully?

Last updated: 2021-02-04

How do I revert to a stable kernel after an update prevents my Amazon Elastic Compute Cloud (Amazon EC2) instance from rebooting successfully?

Short description

If you performed a kernel update to your EC2 Linux instance but the kernel is now corrupt, the instance can't reboot. You can't use SSH to connect to the impaired instance. However, you can create a temporary rescue instance, and then remount your Amazon Elastic Block Store (Amazon EBS) volume on the rescue instance. From the rescue instance, you can configure your GRUB to take the previous kernel for booting.

Important: Don't perform this procedure on an instance store-backed instance. Because the recovery procedure requires a stop and start of your instance, any data on that instance is lost. For more information, see Determine the root device type of your instance.

Resolution

Attach the root volume to a rescue EC2 instance

1.    Create an EBS snapshot of the root volume. For more information, see Create Amazon EBS snapshots.

2.    Open the Amazon EC2 console.

Note: Be sure that you're in the correct Region.

3.    Select Instances from the navigation pane, and then choose the impaired instance.

4.    Choose Instance State, Stop instance, and then select Stop.

5.    In the Storage tab, under Block devices, select the Volume ID for /dev/sda1.

Note: The root device differs by AMI, but /dev/xvda or /dev/sda1 are always reserved for the root device. For example, Amazon Linux 1 and 2 use /dev/xvda. Other distributions, such as Ubuntu 14, 16, 18, CentOS 7, and RHEL 7.5, use /dev/sda1.

6.    Choose Actions, Detach Volume, and then select Yes, Detach. Note the Availability Zone.

7.    Launch a rescue EC2 instance in the same Availability Zone.

Note: Depending on the product code, you might be required to launch an EC2 instance of the same OS type. For example, if the impaired EC2 instance is a paid RHEL AMI, you must launch an AMI with the same product code. For more information, see Get the product code for your instance.

8.    After the rescue instance launches, choose Volumes from the navigation pane, and then choose the detached root volume of the impaired instance.

9.    Choose Actions, Attach Volume.

10.    Choose the rescue instance ID ( id-xxxxx), and then set an unused device. In this example, /dev/xvdb.

Mount the volume of the impaired instance

1.    Use SSH to connect to the rescue instance.

2.    Run the lsblk command to view your available disk devices:

lsblk

The following is an example of the output:

NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0     0   15G  0 disk
└─xvda1 202:1     0   15G  0 part /
xvdb    202:0     0   15G  0 disk
    └─xvdb1 202:1 0   15G  0 part

Note: Nitro-based instances expose EBS volumes as NVMe block devices. The output generated by the lsblk command on Nitro-based instances shows the disk names as nvme[0-26]n1. For more information, see Amazon EBS and NVMe on Linux instances.

3.    Create a mount directory, and then mount the root partition of the mounted volume to this new directory. In the preceding example, /dev/xvdb1 is the root partition of the mounted volume. For more information, see Make an Amazon EBS volume available for use on Linux.

sudo mkdir /mount
sudo mount /dev/xvdb1 /mount

You can now access the data of the impaired instance through the mount directory.

4.    Mount /dev, /run, /proc, and /sys of the rescue instance to the same paths as the newly mounted volume:

sudo mount -o bind /dev /mount/dev
sudo mount -o bind /run /mount/run
sudo mount -o bind /proc /mount/proc 
sudo mount -o bind /sys /mount/sys

5.    Call the chroot function to change into the mount directory:

sudo chroot /mount

Update the default kernel in the GRUB bootloader

The current corrupt kernel is in position 0 (zero) in the list. The last stable kernel is in position 1. To replace the corrupt kernel with the stable kernel use one of the following procedures, based on your distro:

GRUB1 (Legacy GRUB) for Red Hat 6 and Amazon Linux

GRUB2 for Ubuntu 14 LTS and 16.04

GRUB2 for RHEL 7.5 and Amazon Linux 2

GRUB2 for RHEL 8 and CentOS 8

GRUB1 (Legacy GRUB) for Red Hat 6 and Amazon Linux 1

Use the sed command to replace the corrupt kernel with the stable kernel in the /boot/grub/grub.conf file:

sudo sed -i '/^default/ s/0/1/' /boot/grub/grub.conf

GRUB2 for Ubuntu 14 LTS and 16.04

1.    Replace the corrupt GRUB_DEFAULT=0 default menu entry with the stable GRUB_DEFAULT=saved value in the /etc/default/grub file:

sed -i 's/GRUB_DEFAULT=0/GRUB_DEFAULT=saved/g' /etc/default/grub

2.    Update grub to recognize the change:

sudo update-grub

3.    Run the grub-set-default command so that the stable kernel loads at the next reboot. In this example, grub-set-default is set to 1 in position 0:

sudo grub-set-default 1

GRUB2 for RHEL 7.5 and Amazon Linux 2

1.    Replace the corrupt GRUB_DEFAULT=0 default menu entry with the stable GRUB_DEFAULT-saved value in the /etc/default/grub file:

sed -i 's/GRUB_DEFAULT=0/GRUB_DEFAULT=saved/g' /etc/default/grub

2.    Update grub to regenerate the /boot/grub2/grub.cfg file:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

3.    Run the grub2-set-default command so that the stable kernel loads at the next reboot. In this example grub2-set-default is set to 1 in position 0:

sudo grub2-set-default 1

GRUB2 for RHEL 8 and CentOS 8

GRUB2 in RHEL 8 and CentOS 8 uses blscfg files and entries in /boot/loader for the boot configuration, instead of the previous grub.cfg format. It's a best practice to use the grubby tool for managing the blscfg files and retrieving information from the /boot/loader/entries/. If the blscfg files are missing from this location or corrupted, grubby doesn't show any results. You must regenerate the files to recover functionality. Therefore, the indexing of the kernels depends on the .conf files located under /boot/loader/entries and on the kernel versions. Indexing is configured to keep the latest kernel with the lowest index. For information on how to regenerate BLS configuration files, see How can I recover my Red Hat 8 or CentOS 8 instance that is failing to boot due to issues with the Grub2 BLS configuration file?

1.    Run the grubby --default-kernel command to see the current default kernel:

grubby --default-kernel

2.    Run the grubby --info=ALL command to see all available kernels and their indexes:

grubby --info=ALL

The following is example output from the --info=ALL command:

[root@ip-10-10-1-111 ~]# grubby --info=ALL
index=0
kernel="/boot/vmlinuz-4.18.0-147.3.1.el8_1.x86_64"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau crashkernel=auto $tuned_params"
root="UUID=a727b695-0c21-404a-b42b-3075c8deb6ab"
initrd="/boot/initramfs-4.18.0-147.3.1.el8_1.x86_64.img $tuned_initrd"
title="Red Hat Enterprise Linux (4.18.0-147.3.1.el8_1.x86_64) 8.1 (Ootpa)"
id="2bb67fbca2394ed494dc348993fb9b94-4.18.0-147.3.1.el8_1.x86_64"
index=1
kernel="/vmlinuz-0-rescue-2bb67fbca2394ed494dc348993fb9b94"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau crashkernel=auto"
root="UUID=a727b695-0c21-404a-b42b-3075c8deb6ab"
initrd="/initramfs-0-rescue-2bb67fbca2394ed494dc348993fb9b94.img"
title="Red Hat Enterprise Linux (0-rescue-2bb67fbca2394ed494dc348993fb9b94) 8.1 (Ootpa)"
id="2bb67fbca2394ed494dc348993fb9b94-0-rescue"
index=2
kernel="/boot/vmlinuz-4.18.0-80.4.2.el8_0.x86_64"
args="ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 rd.blacklist=nouveau crashkernel=auto $tuned_params"
root="UUID=a727b695-0c21-404a-b42b-3075c8deb6ab"
initrd="/boot/initramfs-4.18.0-80.4.2.el8_0.x86_64.img $tuned_initrd"
title="Red Hat Enterprise Linux (4.18.0-80.4.2.el8_0.x86_64) 8.0 (Ootpa)"
id="c74bc11fb3d6436bb2716196dd0e7a47-4.18.0-80.4.2.el8_0.x86_64"

Note the path of the kernel that you want to set as the default for your instance. In the preceding example, the path for the kernel at index 2 is /boot/vmlinuz- 0-4.18.0-80.4.2.el8_1.x86_64.

3.    Run the grubby --set-default command to change the default kernel of the instance:

grubby --set-default=/boot/vmlinuz-0-rescue-4.18.0-80.4.2.el8_1.x86_64

Note: Replace 4.18.0-80.4.2.el8_1.x86_64 with your kernel's version number.

4.    Run the grubby --default-kernel command to verify that the preceding command worked:

grubby --default-kernel

Unmount volumes, detach the root volume from the rescue instance, and then attach it to the impaired instance

1.    Exit from chroot, and unmount /dev, /run, /proc, and /sys:

exit
sudo umount /mount/dev
sudo umount /mount/run
sudo umount /mount/proc
sudo umount /mount/sys
sudo umount /mount

2.    From the Amazon EC2 console, choose Instances, and then choose the rescue instance.

3.    Choose Instance State, Stop instance, and then select Yes, Stop.

4.    Detach the root volume vol-xxx from the rescue instance.

5.    Attach the root volume you detached in step 3 to the impaired instance as the root volume (/dev/sda1), and then start the instance.

Note: The root device differs by AMI. The names /dev/xvda or /dev/sda1 are always reserved for the root device. For example, Amazon Linux 1 and 2 use /dev/xvda. Other distributions, such as Ubuntu 14, 16, 18, CentOS 7, and RHEL 7.5, use /dev/sda1.

The stable kernel now loads and your instance reboots.


Did this article help?


Do you need billing or technical support?