Re: [PATCHSETS] v14 fsdax-rmap + v11 fsdax-reflink

From: Dan Williams
Date: Wed May 11 2022 - 11:47:12 EST


On Wed, May 11, 2022 at 8:21 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>
> Oan Tue, May 10, 2022 at 10:24:28PM -0700, Andrew Morton wrote:
> > On Tue, 10 May 2022 19:43:01 -0700 "Darrick J. Wong" <djwong@xxxxxxxxxx> wrote:
> >
> > > On Tue, May 10, 2022 at 07:28:53PM -0700, Andrew Morton wrote:
> > > > On Tue, 10 May 2022 18:55:50 -0700 Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> > > >
> > > > > > It'll need to be a stable branch somewhere, but I don't think it
> > > > > > really matters where al long as it's merged into the xfs for-next
> > > > > > tree so it gets filesystem test coverage...
> > > > >
> > > > > So how about let the notify_failure() bits go through -mm this cycle,
> > > > > if Andrew will have it, and then the reflnk work has a clean v5.19-rc1
> > > > > baseline to build from?
> > > >
> > > > What are we referring to here? I think a minimal thing would be the
> > > > memremap.h and memory-failure.c changes from
> > > > https://lkml.kernel.org/r/20220508143620.1775214-4-ruansy.fnst@xxxxxxxxxxx ?
> > > >
> > > > Sure, I can scoot that into 5.19-rc1 if you think that's best. It
> > > > would probably be straining things to slip it into 5.19.
> > > >
> > > > The use of EOPNOTSUPP is a bit suspect, btw. It *sounds* like the
> > > > right thing, but it's a networking errno. I suppose livable with if it
> > > > never escapes the kernel, but if it can get back to userspace then a
> > > > user would be justified in wondering how the heck a filesystem
> > > > operation generated a networking errno?
> > >
> > > <shrug> most filesystems return EOPNOTSUPP rather enthusiastically when
> > > they don't know how to do something...
> >
> > Can it propagate back to userspace?
>
> AFAICT, the new code falls back to the current (mf_generic_kill_procs)
> failure code if the filesystem doesn't provide a ->memory_failure
> function or if it returns -EOPNOSUPP. mf_generic_kill_procs can also
> return -EOPNOTSUPP, but all the memory_failure() callers (madvise, etc.)
> convert that to 0 before returning it to userspace.
>
> I suppose the weirder question is going to be what happens when madvise
> starts returning filesystem errors like EIO or EFSCORRUPTED when pmem
> loses half its brains and even the fs can't deal with it.

Even then that notification is not in a system call context so it
would still result in a SIGBUS notification not a EOPNOTSUPP return
code. The only potential gap I see are what are the possible error
codes that MADV_SOFT_OFFLINE might see? The man page is silent on soft
offline failure codes. Shiyang, that's something to check / update if
necessary.