Re: [PATCH 1/3] vfs: add new f_op->syncfs vector

From: Jeff Layton
Date: Thu Dec 17 2020 - 14:50:15 EST


On Thu, 2020-12-17 at 00:49 +0000, Al Viro wrote:
> [Christoph added to Cc...]
> On Wed, Dec 16, 2020 at 06:31:47PM -0500, Vivek Goyal wrote:
> > Current implementation of __sync_filesystem() ignores the return code
> > from ->sync_fs(). I am not sure why that's the case. There must have
> > been some historical reason for this.
> >
> > Ignoring ->sync_fs() return code is problematic for overlayfs where
> > it can return error if sync_filesystem() on upper super block failed.
> > That error will simply be lost and sycnfs(overlay_fd), will get
> > success (despite the fact it failed).
> >
> > If we modify existing implementation, there is a concern that it will
> > lead to user space visible behavior changes and break things. So
> > instead implement a new file_operations->syncfs() call which will
> > be called in syncfs() syscall path. Return code from this new
> > call will be captured. And all the writeback error detection
> > logic can go in there as well. Only filesystems which implement
> > this call get affected by this change. Others continue to fallback
> > to existing mechanism.
>
> That smells like a massive source of confusion down the road. I'd just
> looked through the existing instances; many always return 0, but quite
> a few sometimes try to return an error:
> fs/btrfs/super.c:2412: .sync_fs = btrfs_sync_fs,
> fs/exfat/super.c:204: .sync_fs = exfat_sync_fs,
> fs/ext4/super.c:1674: .sync_fs = ext4_sync_fs,
> fs/f2fs/super.c:2480: .sync_fs = f2fs_sync_fs,
> fs/gfs2/super.c:1600: .sync_fs = gfs2_sync_fs,
> fs/hfsplus/super.c:368: .sync_fs = hfsplus_sync_fs,
> fs/nilfs2/super.c:689: .sync_fs = nilfs_sync_fs,
> fs/ocfs2/super.c:139: .sync_fs = ocfs2_sync_fs,
> fs/overlayfs/super.c:399: .sync_fs = ovl_sync_fs,
> fs/ubifs/super.c:2052: .sync_fs = ubifs_sync_fs,
> is the list of such. There are 4 method callers:
> dquot_quota_sync(), dquot_disable(), __sync_filesystem() and
> sync_fs_one_sb(). For sync_fs_one_sb() we want to ignore the
> return value; for __sync_filesystem() we almost certainly
> do *not* - it ends with return __sync_blockdev(sb->s_bdev, wait),
> after all. The question for that one is whether we want
> __sync_blockdev() called even in case of ->sync_fs() reporting
> a failure, and I suspect that it's safer to call it anyway and
> return the first error value we'd got. No idea about quota
> situation.
>

The only problem there is that makes it a bit difficult to override the
error return to syncfs, which is really what overlayfs wants to be able
to do. Their syncfs syncs out the upper layer, so it makes sense to just
have their file->f_sb_err track the upper layer's sb->s_wb_err.

You can plumb the errors from sync_fs all the way through to the syncfs
syscall, but we can't currently tell whether we're doing the sync_fs op
on behalf of sync(), syncfs() or something else entirely. We need to
ensure that if it does return an error, that it doesn't get dropped on
the floor.

I think it'd be simpler to just add f_op->syncfs and change
s_op->sync_fs to a different name, to lessen the confusion.
s_op->sync_fs sort of makes it look like you're implementing syncfs(2),
but there's a bit more to it than that.

Maybe s_op->sync_filesystem? There are only about 113 instances
"sync_fs" in the tree. Changing the name might also help highlight the
fact that the return code won't be ignored like it used to be.
--
Jeff Layton <jlayton@xxxxxxxxxx>