Re: kernel BUG at fs/ext4/inode.c:1721!

From: Eric Whitney
Date: Mon Oct 11 2021 - 19:11:30 EST


* Borislav Petkov <bp@xxxxxxxxx>:
> Hi Eric,
>
> On Fri, Oct 08, 2021 at 01:33:05PM -0400, Eric Whitney wrote:
> > Hi, Boris - thanks very much for your report.
>
> sure, np.
>
> > Was your kernel configured with the CONFIG_FS_ENCRYPTION option?
>
> $ grep CONFIG_FS_ENCRYPTION /boot/config-5.15.0-rc4+
> # CONFIG_FS_ENCRYPTION is not set
>
> > Could you please provide the output of the mount command for the affected
> > file system?
>
> Well, I can't figure out from dmesg - it's all I have from that run -
> which fs it was. So lemme give you all ext4 ones:
>
> $ mount | grep ext4
> /dev/nvme0n1p2 on / type ext4 (rw,relatime,errors=remount-ro)
> /dev/sdc1 on /home type ext4 (rw,noatime)
> /dev/sda1 on /mnt/oldhome type ext4 (rw,noatime)
> /dev/sdb1 on /mnt/smr type ext4 (rw,noatime)
> /dev/nvme1n1p1 on /mnt/kernel type ext4 (rw,nosuid,nodev,noatime,user)
>
> > Do you recall what sort of code might have been running on this system at
> > the time of failure (for example, kernel build, desktop apps, etc.)?
>
> Good question. I'm not sure. Kernel build is likely as I do those on
> that workstation constantly.
>
> Unfortunately, I don't have an exact reproducer. And I can't debug stuff
> on that box since it is my workstation and I've reverted it to 5.14.
>
> What I can do is, I can slap 5.15-rc4 or whichever version you'd want me
> to, on a test box and try running kernel builds or some other load to
> see whether it would fire. I have a similar box to my workstation.
>
> Or if you have a better idea...

Hi, Boris:

I've tried numerous kernel builds with -rc4 and rerun the full set of xfstests
we use when regressing ext4 each rc using a kernel that doesn't enable
FS_ENCRYPTION (I normally run with that) without luck. The code that caused
the splat you saw is new and would run when an assertion is violated,
suggesting that there may be an unsuspected bug elsewhere in ext4.

Do you recall having seen any evidence of ENOMEM or ENOSPC conditions prior
to the failure?

If you're willing to share, please send along your kernel config file and I'll
try working with that as well.

In the meantime, should this bug get in your way, just revert the following
patch and you should be able to run without further trouble:

948ca5f30e1d "ext4: enforce buffer head state assertion in ext4_da_map_blocks"

I'll likely be posting a patch to revert this shortly, since it's going to
take some time to sort out what's going on without a reproducer.

Thanks again for your help,
Eric

>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette