Re: [syzbot] [f2fs?] WARNING in rcu_sync_dtor
From: Jan Kara
Date: Tue Jul 30 2024 - 08:38:19 EST
On Mon 29-07-24 09:58:47, Theodore Ts'o wrote:
> On Mon, Jul 29, 2024 at 03:27:21PM +0200, Jan Kara wrote:
> > So in ext4 we have EXT4_FLAGS_SHUTDOWN flag which we now use
> > internally instead of SB_RDONLY flag for checking whether the
> > filesystem was shutdown (because otherwise races between remount and
> > hitting fs error were really messy). However we still *also* set
> > SB_RDONLY so that VFS bails early from some paths which generally
> > results in less error noise in kernel logs and also out of caution
> > of not breaking something in this path. That being said we also
> > support EXT4_IOC_SHUTDOWN ioctl for several years and in that path
> > we set EXT4_FLAGS_SHUTDOWN without setting SB_RDONLY and nothing
> > seems to have blown up. So I'm inclined to belive we could remove
> > setting of SB_RDONLY from ext4 error handling. Ted, what do you
> > think?
>
> Well, there are some failures of generic/388 (which involves calling
> the shutdown ioctl while running fsstress). I believe that most of
> those failures are file system corruption errors, as opposed to other
> sorts of failures, but we don't run KASAN kernels all that often,
> especially since generic/388 is now on the exclude list.
As far as I remember the reason for those failures were mostly because the
fs shutdown happened in the middle of some operation on another CPU and
this tickled unusual error handling paths that eventually resulted in
WARN_ONs and similar.
> The failure rate of generic/388 varies depending on the storage device
> involved, but it varies from less than 10% to 50% of the time, if
> memory serves correctly. Since EXT4_IOC_SHUTDOWN is used most of the
> time as a debugging/test (although there are some users use it in
> production, but the failure rate when you're not doing something
> really aggressive like fsstress is very small), this has been on the
> "one of these days, when we have tons of free time, we should really
> look into this. The challenge is fixing this in a way that doesn't
> involve adding new locking in various file system hotpaths.
>
> So "nothing seems to have blown up" might be a bit strong. But it's
> something we can try doing, and see whether it results in more rather
> than less syzbot complaints.
OK. I don't expect real troubles within the filesystem itself here because
the read-only check currently brings us only the benefit that the
filesystem isn't even entered in a lot of cases. But at latest by the time
we try to start a transaction handle, we get back error and bail out anyway
after the fs was shutdown and this is reasonably well tested path. What might
have larger impact is that userspace will be getting back EIO / EUCLEAN
instead of EROFS. But I hope it won't be a big deal either.
> > Also as the "filesystem shutdown" is spreading across multiple
> > filesystems, I'm playing with the idea that maybe we could lift a
> > flag like this to VFS so that we can check it in VFS paths and abort
> > some operations early. But so far I'm not convinced the gain is
> > worth the need to iron out various subtle semantical differences of
> > "shutdown" among filesystems.
>
> I think that might be a good idea. Hopefully subtle semantic
> differences are ones that won't matter in terms of the VFS aborting
> operations early.
OK, I guess I'll try and see.
Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR