Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected

From: Song Liu
Date: Tue Feb 06 2024 - 03:08:27 EST


On Thu, Jan 25, 2024 at 12:31 PM Dan Moulding <dan@xxxxxxxx> wrote:
>
> Hi Junxiao,
>
> I first noticed this problem the next day after I had upgraded some
> machines to the 6.7.1 kernel. One of the machines is a backup server.
> Just a few hours after the upgrade to 6.7.1, it started running its
> overnight backup jobs. Those backup jobs hung part way through. When I
> tried to check on the backups in the morning, I found the server
> mostly unresponsive. I could SSH in but most shell commands would just
> hang. I was able to run top and see that the md0_raid5 kernel thread
> was using 100% CPU. I tried to reboot the server, but it wasn't able
> to successfully shutdown and eventually I had to hard reset it.
>
> The next day, the same sequence of events occurred on that server
> again when it tried to run its backup jobs. Then the following day, I
> experienced another hang on a different machine, with a similar RAID-5
> configuration. That time I was scp'ing a large file to a virtual
> machine whose image was stored on the RAID-5 array. Part way through
> the transfer scp reported that the transfer had stalled. I checked top
> on that machine and found once again that the md0_raid5 kernel thread
> was using 100% CPU.
>
> Yesterday I created a fresh Fedora 39 VM for the purposes of
> reproducing this problem in a different environment (the other two
> machines are both Gentoo servers running v6.7 kernels straight from
> the stable trees with a custom kernel configuration). I am able to
> reproduce the problem on Fedora 39 running both the v6.6.13 stable
> tree kernel code and the Fedora 39 6.6.13 distribution kernel.
>
> On this Fedora 39 VM, I created a 1GiB LVM volume to use as the RAID-5
> journal from space on the "boot" disk. Then I attached 3 additional
> 100 GiB virtual disks and created the RAID-5 from those 3 disks and
> the write-journal device. I then created a new LVM volume group from
> the md0 array and created one LVM logical volume named "data", using
> all but 64GiB of the available VG space. I then created an ext4 file
> system on the "data" volume, mounted it, and used "dd" to copy 1MiB
> blocks from /dev/urandom to a file on the "data" file system, and just
> let it run. Eventually "dd" hangs and top shows that md0_raid5 is
> using 100% CPU.
>
> Here is an example command I just ran, which has hung after writing
> 4.1 GiB of random data to the array:
>
> test@localhost:~$ dd if=/dev/urandom bs=1M of=/data/random.dat status=progress
> 4410310656 bytes (4.4 GB, 4.1 GiB) copied, 324 s, 13.6 MB/s

Update on this..

I haven't been testing the following config md-6.9 branch [1].
The array works fine afaict.

Dan, could you please run the test on this branch
(83cbdaf61b1ab9cdaa0321eeea734bc70ca069c8)?

Thanks,
Song


[1] https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-6.9

[root@eth50-1 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
vda 253:0 0 32G 0 disk
├─vda1 253:1 0 2G 0 part /boot
└─vda2 253:2 0 30G 0 part /
nvme2n1 259:0 0 50G 0 disk
└─md0 9:0 0 100G 0 raid5
├─vg--md--data-md--data-real 250:2 0 50G 0 lvm
│ ├─vg--md--data-md--data 250:1 0 50G 0 lvm /mnt/2
│ └─vg--md--data-snap 250:4 0 50G 0 lvm
└─vg--md--data-snap-cow 250:3 0 49G 0 lvm
└─vg--md--data-snap 250:4 0 50G 0 lvm
nvme0n1 259:1 0 50G 0 disk
└─md0 9:0 0 100G 0 raid5
├─vg--md--data-md--data-real 250:2 0 50G 0 lvm
│ ├─vg--md--data-md--data 250:1 0 50G 0 lvm /mnt/2
│ └─vg--md--data-snap 250:4 0 50G 0 lvm
└─vg--md--data-snap-cow 250:3 0 49G 0 lvm
└─vg--md--data-snap 250:4 0 50G 0 lvm
nvme1n1 259:2 0 50G 0 disk
└─md0 9:0 0 100G 0 raid5
├─vg--md--data-md--data-real 250:2 0 50G 0 lvm
│ ├─vg--md--data-md--data 250:1 0 50G 0 lvm /mnt/2
│ └─vg--md--data-snap 250:4 0 50G 0 lvm
└─vg--md--data-snap-cow 250:3 0 49G 0 lvm
└─vg--md--data-snap 250:4 0 50G 0 lvm
nvme4n1 259:3 0 2G 0 disk
nvme3n1 259:4 0 50G 0 disk
└─vg--data-lv--journal 250:0 0 512M 0 lvm
└─md0 9:0 0 100G 0 raid5
├─vg--md--data-md--data-real 250:2 0 50G 0 lvm
│ ├─vg--md--data-md--data 250:1 0 50G 0 lvm /mnt/2
│ └─vg--md--data-snap 250:4 0 50G 0 lvm
└─vg--md--data-snap-cow 250:3 0 49G 0 lvm
└─vg--md--data-snap 250:4 0 50G 0 lvm
nvme5n1 259:5 0 2G 0 disk
nvme6n1 259:6 0 4G 0 disk
[root@eth50-1 ~]# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 nvme2n1[4] dm-0[3](J) nvme1n1[1] nvme0n1[0]
104790016 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>
[root@eth50-1 ~]# mount | grep /mnt/2
/dev/mapper/vg--md--data-md--data on /mnt/2 type ext4 (rw,relatime,stripe=256)