Running a ZFS boot pool for your Proxmox system can lead to unexpected boot issues if you do not copy the bootloaders after replacing a disk that failed.
When you use a ZFS pool for your boot disk, only the content of your boot disk is copied and your boot partitions are left untouched or uncreated.
To ensure that your replacement disk will still be able to boot up properly, you need to ensure that the disk partitions are properly configured.
These are my notes:
Scenario: You are running a 2 mirrored ZFS boot pool with two boot disks (
/dev/sda
and/dev/sdb
) as your boot drive. One of the disks (/dev/sdb
) has now FAULTED and you need to replace it.
Prerequisites
- You need full sudo/root access and hardware access to install the replacement physical disk
- You should have done a backup of all your important files prior to working on this
1. Install your new Disk
Install your new disk into the machine and ensure that it is now showing. If properly connected you should be able to see your new disk when you do the following:
$ lsblk
Double check that the new disk is showing and take note of the new disk device file location.
You can make use of dmesg
or lsblk
to look for the disk that matches your new disk size if you are unsure which is the correct device file location.
# Check recent disk additions
$ dmesg | tail -20 | grep -i "disk\|sd"
For the purpose of this scenario, let’s assume that your new disk landed on /dev/sdc
2. Re-partition your new disk
Before you start to add your disk back into your boot pool, you will need to make sure that your partition tables are all properly setup and is similar to your other disk.
This is because for most boot disks, you will have the following likely partition format:
/dev/sda1 - BIOS Boot
/dev/sda2 - EFI System Partition (FAT32)
/dev/sda3 - Root partition (ZFS)
This structure is needed to boot up on most modern UEFI systems.
Explainer:
You have 2 boot partitions
sda1
andsda2
because of backward compatibility with legacy BIOS systems.sda1
contains the BIOS boot partition for legacy and older BIOS, whilesda2
contains the UEFI boot partition for modern boot systems.You only need to restore the EFI System Partition (
/dev/sda2
) since most Proxmox installations boot via UEFI. The BIOS Boot partition is created for compatibility but doesn’t need special handling during disk replacement - it gets copied automatically when you replicate the partition table withsgdisk
.
We can start by copying over the partition table using sgdisk
Note: This is a destructive command, make sure your new disk (
/dev/sdc
) has no valuable data that you have yet to retrieve or backup
# Replicate your disk
# sgdisk -R <newdisk> <currentdisk>
$ sgdisk -R /dev/sdc /dev/sda
# generate a random GUID to prevent conflicts after duplication
$ sgdisk -G /dev/sdc
Verify that your partitions were created
$ fdisk -l /dev/sdc
3. Add Disk into Boot Pool (Replace)
Now that you have your disk ready, you will need to prepare the replacement.
First, you will need to identify your boot pool.
Using
$ zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 1.81T 123G 1.69T - - 2% 6% 1.00x ONLINE -
tank 7.25T 4.52T 2.73T - - 15% 62% 1.00x ONLINE -
You might see pools like the above.
A quick way to possibly find out which is your boot pool is to run a df /
and look for the disk which mounts on /
$ df /
rpool/ROOT/pve-1 1834057472 4456448 1829601024 1% /
In this example, it is rpool
Then, check on the status of the zfs pools to identify the device id of the faulted disk
$ zpool status
pool: rpool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning.
action: Restore the faulted device or replace it with a new device.
scan: none requested
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
/dev/sda3 ONLINE 0 0 0
/dev/sdb3 UNAVAILABLE 0 0 0 cannot open
Take note of the device that is unavailable as you will need it to be replaced.
# Replace the failed disk in ZFS pool
$ zpool replace rpool /dev/sdb3 /dev/sdc3
# Monitor resilver progress (this can take several hours for large disks)
$ watch zpool status rpool
Once resilver is completed, you can now re-add the bootloader.
4. Install Bootloader (Proxmox)
Proxmox has its own boot-tool that conveniently sets up the bootloader without much work. I highly recommend that you use it.
# Format ESP (EFI System Partition) on new disk
$ mkfs.fat -F32 /dev/sdc2
# Initialize Proxmox boot tool on new disk
$ proxmox-boot-tool format /dev/sdc2
$ proxmox-boot-tool init /dev/sdc2
# Update bootloader configuration
$ proxmox-boot-tool refresh
Once that is done, your 2nd disk should be bootable again.
5. Verify
Before you reboot your system, it is recommended to do some quick checks to verify that your disk is bootable.
Run:
$ proxmox-boot-tool status
You should see an output like:
EFISystemPartitions: /dev/sda2 (/boot/efi) /dev/sdc2 (/boot/efi)
ZFS: rpool/ROOT/pve-1
This confirms that /dev/sdc2
is registered and part of the boot environment.
But let’s check the bootmgr itself:
$ efibootmgr -v
You should see something similar like:
Boot0006* Linux Boot Manager HD(2,GPT,93a85993-b65f-477e-b768-d8a8a723e3a7,0x800,0x200000)/File(\EFI\systemd\systemd-bootx64.efi)
Boot0007* Linux Boot Manager HD(2,GPT,02afafa3-750c-4f00-8f69-3dc4c3d2ca7d,0x800,0x200000)/File(\EFI\systemd\systemd-bootx64.efi)
This tells us that there’s a Linux Boot Manager running at the following partition ids:
1. 93a85993-b65f-477e-b768-d8a8a723e3a7
2. 02afafa3-750c-4f00-8f69-3dc4c3d2ca7d
With this information, we can double check to see if this is the 2 boot disks that we’ve installed by running the following on each of the partition ids
$ blkid | grep "93a85993-b65f-477e-b768-d8a8a723e3a7"
/dev/sda2: UUID="9E48-00CA" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="93a85993-b65f-477e-b768-d8a8a723e3a7"
If you see results for both sda
and sdc
then it’s likely everything is setup correctly.
After verification, test boot by rebooting the system to ensure everything works correctly.
Summary
With these additional steps, you can make sure that your ZFS boot mirror will always be able to correctly boot even after a replacement disk. If you’re booting any of the other distros, everything is the same except Step 4, where you have to follow their own instructions on loading the bootloader onto the disk.
Hope this helps!