在 2022/5/11 23:46, Dan Williams 写道:
On Wed, May 11, 2022 at 8:21 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
Oan Tue, May 10, 2022 at 10:24:28PM -0700, Andrew Morton wrote:
On Tue, 10 May 2022 19:43:01 -0700 "Darrick J. Wong" <djwong@xxxxxxxxxx> wrote:
On Tue, May 10, 2022 at 07:28:53PM -0700, Andrew Morton wrote:
On Tue, 10 May 2022 18:55:50 -0700 Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
It'll need to be a stable branch somewhere, but I don't think it
really matters where al long as it's merged into the xfs for-next
tree so it gets filesystem test coverage...
So how about let the notify_failure() bits go through -mm this cycle,
if Andrew will have it, and then the reflnk work has a clean v5.19-rc1
baseline to build from?
What are we referring to here? I think a minimal thing would be the
memremap.h and memory-failure.c changes from
https://lkml.kernel.org/r/20220508143620.1775214-4-ruansy.fnst@xxxxxxxxxxx ?
Sure, I can scoot that into 5.19-rc1 if you think that's best. It
would probably be straining things to slip it into 5.19.
The use of EOPNOTSUPP is a bit suspect, btw. It *sounds* like the
right thing, but it's a networking errno. I suppose livable with if it
never escapes the kernel, but if it can get back to userspace then a
user would be justified in wondering how the heck a filesystem
operation generated a networking errno?
<shrug> most filesystems return EOPNOTSUPP rather enthusiastically when
they don't know how to do something...
Can it propagate back to userspace?
AFAICT, the new code falls back to the current (mf_generic_kill_procs)
failure code if the filesystem doesn't provide a ->memory_failure
function or if it returns -EOPNOSUPP. mf_generic_kill_procs can also
return -EOPNOTSUPP, but all the memory_failure() callers (madvise, etc.)
convert that to 0 before returning it to userspace.
I suppose the weirder question is going to be what happens when madvise
starts returning filesystem errors like EIO or EFSCORRUPTED when pmem
loses half its brains and even the fs can't deal with it.
Even then that notification is not in a system call context so it
would still result in a SIGBUS notification not a EOPNOTSUPP return
code. The only potential gap I see are what are the possible error
codes that MADV_SOFT_OFFLINE might see? The man page is silent on soft
offline failure codes. Shiyang, that's something to check / update if
necessary.
According to the code around MADV_SOFT_OFFLINE, it will return -EIO when the backend is NVDIMM.
Here is the logic:
madvise_inject_error() {
...
if (MADV_SOFT_OFFLINE) {
ret = soft_offline_page() {
...
/* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */
page = pfn_to_online_page(pfn);
if (!page) {
put_ref_page(ref_page);
return -EIO;
}
...
}
} else {
ret = memory_failure()
}
return ret
}
--
Thanks,
Ruan.