Re: regression: data corruption with ext4 on LUKS on nvme with torvalds master

From: Alex Xu (Hello71)
Date: Sat May 08 2021 - 22:30:13 EST


Excerpts from Alex Xu (Hello71)'s message of May 8, 2021 1:54 pm:
> Hi all,
>
> Using torvalds master, I recently encountered data corruption on my ext4
> volume on LUKS on NVMe. Specifically, during heavy writes, the system
> partially hangs; SysRq-W shows that processes are blocked in the kernel
> on I/O. After forcibly rebooting, chunks of files are replaced with
> other, unrelated data. I'm not sure exactly what the data is; some of it
> is unknown binary data, but in at least one case, a list of file paths
> was inserted into a file, indicating that the data is misdirected after
> encryption.
>
> This issue appears to affect files receiving writes in the temporal
> vicinity of the hang, but affects both new and old data: for example, my
> shell history file was corrupted up to many months before.
>
> The drive reports no SMART issues.
>
> I believe this is a regression in the kernel related to something merged
> in the last few days, as it consistently occurs with my most recent
> kernel versions, but disappears when reverting to an older kernel.
>
> I haven't investigated further, such as by bisecting. I hope this is
> sufficient information to give someone a lead on the issue, and if it is
> a bug, nail it down before anybody else loses data.
>
> Regards,
> Alex.
>

I found the following test to reproduce a hang, which I guess may be the
cause:

host$ cd /tmp
host$ truncate -s 10G drive
host$ qemu-system-x86_64 -drive format=raw,file=drive,if=none,id=drive -device nvme,drive=drive,serial=1 [... more VM setup options]
guest$ cryptsetup luksFormat /dev/nvme0n1
[accept warning, use any password]
guest$ cryptsetup open /dev/nvme0n1
[enter password]
guest$ mkfs.ext4 /dev/mapper/test
[normal output...]
Creating journal (16384 blocks): [hangs forever]

I bisected this issue to:

cd2c7545ae1beac3b6aae033c7f31193b3255946 is the first bad commit
commit cd2c7545ae1beac3b6aae033c7f31193b3255946
Author: Changheun Lee <nanich.lee@xxxxxxxxxxx>
Date: Mon May 3 18:52:03 2021 +0900

bio: limit bio max size

I didn't try reverting this commit or further reducing the test case.
Let me know if you need my kernel config or other information.

Regards,
Alex.