Why is my EC2 Linux instance going into emergency mode when I try to boot it?

7 分的閱讀內容
0

When I boot my Amazon Elastic Compute Cloud (Amazon EC2) Linux instance, the instance goes into emergency mode and the boot process fails. Then, the instance is inaccessible.

Short description

An instance might boot in emergency mode for the following reasons:

  • There is a corrupted kernel on the instance.
  • There are auto-mount failures due to incorrect entries in the /etc/fstab file.

To verify the type of error, view the instance's console output. You might see a kernel panic error message in the console output if the kernel is corrupted. Dependency failed messages appear in the console output if auto-mount failures occur.

Resolution

Kernel panic errors

Kernel panic error messages occur when the grub configuration or initramfs file is corrupted. If a problem with the kernel exists, you might see the error "Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1)" in the console output.

To resolve kernel panic errors:

1.    Revert the kernel to a previous, stable kernel. For instructions on how to revert to a previous kernel, see How do I revert to a known stable kernel after an update prevents my Amazon EC2 instance from rebooting successfully?

2.    After you revert to a previous kernel, reboot the instance. Then, correct the issues on the corrupted kernel.

Dependency failed errors

Instances enter emergency mode if there are auto-mount failures because of syntax errors in the /etc/fstab file. Also, if the Amazon Elastic Block Store (Amazon EBS) volume listed in the file is detached from the instance, then the instance boot process might enter emergency mode. If either of these problems occur, then the console output looks similar to the following:

-------------------------------------------------------------------------------------------------------------------
[[1;33mDEPEND[0m] Dependency failed for /mnt.
[[1;33mDEPEND[0m] Dependency failed for Local File Systems.
[[1;33mDEPEND[0m]
    Dependency failed for Migrate local... structure to the new structure.
[[1;33mDEPEND[0m] Dependency failed for Relabel all filesystems, if necessary.
[[1;33mDEPEND[0m] Dependency failed for Mark the need to relabel after reboot.
[[1;33mDEPEND[0m]
    Dependency failed for File System Check on /dev/xvdf.
-------------------------------------------------------------------------------------------------------------------

The preceding example log messages show that the /mnt mount point failed to mount during the boot sequence.

To prevent the boot sequence from entering emergency mode due to mount failures, add the following to the /etc/fstab file.

  • Add a nofail option in the /etc/fstab file for the secondary partitions (/mnt, in the preceding example). When the nofail option is present, the boot sequence isn't interrupted, even if mounting of a volume or partition fails.
  • Add 0 as the last column of the /etc/fstab file for the respective mount point. The 0 column turns off the file system check and the instance successfully boots.

There are three methods you can use to correct the /etc/fstab file.

Important:

Methods 2 and 3 require a stop and start of the instance. Be aware of the following:

  • Data stored in instance store volumes is lost when the instance is stopped. Make sure that you save a backup of the data before stopping the instance. Unlike EBS-backed volumes, instance store volumes are ephemeral and don't support data persistence.
  • The static public IPv4 address that Amazon EC2 automatically assigned to the instance on launch or start changes after the stop and start. To retain a public IPv4 address that doesn't change if the instance is stopped, use an Elastic IP address.

For more information, see What happens when you stop an instance.

Method 1: Use the EC2 Serial Console

If you turned on the EC2 serial console for Linux, then you can use it to troubleshoot supported Nitro-based instance types and bare metal instances. You can access the serial console from the Amazon EC2 console or the AWS Command Line Interface (AWS CLI). You don't need a working connection to connect to your instance when you use the EC2 serial console.

Note: If you haven't previously used the EC2 serial console, then make sure that you review the prerequisites and configure access before trying to connect. If your instance is unreachable and you haven't configured access to the serial console, then follow the instructions in Method 2 or Method 3.

1.    Open the Amazon EC2 console.

2.    Choose Instances.

3.    Select the instance, then choose Actions, Monitor and troubleshoot, EC2 Serial Console, Connect.

-or-

Select the instance, then choose Connect, EC2 Serial Console, Connect.

An in-browser terminal window opens.

4.    Press Enter. If you're connected to the serial console, then a login prompt appears. If the screen remains black, then use the following information to help resolve issues with connecting to the serial console:

5.    At the login prompt, enter the username of the password-based user that you set up previously, and then press Enter.

6.    At the Password prompt, enter the password, and then press Enter.

You are now logged in to the instance and can use the EC2 serial console for troubleshooting.

You can also connect using your own key and an SSH client.

For more information on using the EC2 serial console, see Connect to the EC2 serial console.

Method 2: Run the AWSSupport-ExecuteEC2Rescue automation document

If your instance is configured for AWS Systems Manager, then you can run the AWSSupport-ExecuteEC2Rescue automation document to correct boot issues. Manual intervention isn't needed when using this method. For information on using the automation document, see Run the EC2Rescue tool on unreachable instances.

Method 3: Manually edit the file using a rescue instance

1.    Open the Amazon EC2 console.

2.    Choose Instances, and then select the instance that's in emergency mode.

3.    Stop the instance.

4.    Detach the Amazon EBS root volume ( /dev/xvda or /dev/sda1) from the stopped instance.

5.    Launch a new EC2 instance in the same Availability Zone as the impaired instance. The new instance becomes your rescue instance.

6.    Attach the root volume that you detached in step 4 to the rescue instance as a secondary device.

Note: You can use different device names when attaching secondary volumes.

7.    Connect to your rescue instance using SSH.

8.    Create a mount point directory for the new volume that you attached to the rescue instance in step 6. In the following example, the mount point directory is /mnt/rescue.

$ sudo mkdir /mnt/rescue

9.    Mount the volume at the directory you created in step 8.

$ sudo mount /dev/xvdf /mnt/rescue

Note: The device (/dev/xvdf, in the preceding example) might be attached to the rescue instance with a different device name. Use the lsblk command to view your available disk devices along with their mount points to determine the correct device names.

10.    After the volume is mounted, run the following command to open the /etc/fstab file.

$ sudo vi /mnt/rescue/etc/fstab

11.    Edit the entries in /etc/fstab as needed. The following example output shows three EBS volumes defined with UUIDs, the nofail option added for both secondary volumes, and a 0 as the last column for each entry.

------------------------------------------------------------------------------------------
$ cat /etc/fstab
UUID=e75a1891-3463-448b-8f59-5e3353af90ba  /  xfs  defaults,noatime  1  0
UUID=87b29e4c-a03c-49f3-9503-54f5d6364b58  /mnt/rescue  ext4  defaults,noatime,nofail  1  0
UUID=ce917c0c-9e37-4ae9-bb21-f6e5022d5381  /mnt  ext4  defaults,noatime,nofail  1  0  
------------------------------------------------------------------------------------------

12.    Save the file, and then run the umount command to unmount the volume.

$ sudo umount /mnt/rescue

13.    Detach the volume from the temporary instance.

14.    Attach the volume to original instance, and then start the instance to confirm that it boots successfully.

AWS 官方
AWS 官方已更新 9 個月前