Re: [syzbot] [f2fs?] WARNING in rcu_sync_dtor

From: Theodore Ts'o
Date: Mon Jul 29 2024 - 09:59:48 EST


On Mon, Jul 29, 2024 at 03:27:21PM +0200, Jan Kara wrote:
>
> So in ext4 we have EXT4_FLAGS_SHUTDOWN flag which we now use
> internally instead of SB_RDONLY flag for checking whether the
> filesystem was shutdown (because otherwise races between remount and
> hitting fs error were really messy). However we still *also* set
> SB_RDONLY so that VFS bails early from some paths which generally
> results in less error noise in kernel logs and also out of caution
> of not breaking something in this path. That being said we also
> support EXT4_IOC_SHUTDOWN ioctl for several years and in that path
> we set EXT4_FLAGS_SHUTDOWN without setting SB_RDONLY and nothing
> seems to have blown up. So I'm inclined to belive we could remove
> setting of SB_RDONLY from ext4 error handling. Ted, what do you
> think?

Well, there are some failures of generic/388 (which involves calling
the shutdown ioctl while running fsstress). I believe that most of
those failures are file system corruption errors, as opposed to other
sorts of failures, but we don't run KASAN kernels all that often,
especially since generic/388 is now on the exclude list.

The failure rate of generic/388 varies depending on the storage device
involved, but it varies from less than 10% to 50% of the time, if
memory serves correctly. Since EXT4_IOC_SHUTDOWN is used most of the
time as a debugging/test (although there are some users use it in
production, but the failure rate when you're not doing something
really aggressive like fsstress is very small), this has been on the
"one of these days, when we have tons of free time, we should really
look into this. The challenge is fixing this in a way that doesn't
involve adding new locking in various file system hotpaths.

So "nothing seems to have blown up" might be a bit strong. But it's
something we can try doing, and see whether it results in more rather
than less syzbot complaints.

> Also as the "filesystem shutdown" is spreading across multiple
> filesystems, I'm playing with the idea that maybe we could lift a
> flag like this to VFS so that we can check it in VFS paths and abort
> some operations early. But so far I'm not convinced the gain is
> worth the need to iron out various subtle semantical differences of
> "shutdown" among filesystems.

I think that might be a good idea. Hopefully subtle semantic
differences are ones that won't matter in terms of the VFS aborting
operations early.

- Ted