RE: [PATCH v1 4/4] mm/hwpoison: Fix incorrect "not recovered" report for recovered clean pages
From: Luck, Tony
Date: Fri Feb 14 2025 - 11:52:40 EST
> > Then the patch will be like:
> >
> > @@ -883,10 +883,9 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
> > (void *)&priv);
> > if (ret == 1 && priv.tk.addr)
> > kill_proc(&priv.tk, pfn, flags);
> > - else
> > - ret = 0;
> > mmap_read_unlock(p->mm);
> > - return ret > 0 ? -EHWPOISON : -EFAULT;
> > +
> > + return ret > 0 ? -EHWPOISON : 0;
> >
> > Here, returning 0 indicates that memory_failure() successfully handled the
> > error by dropping the clean page.
>
> I'm not sure whether there's another scene that can make walk_page_range() returns 0. But if the
> only reason for walk_page_range() returning 0 is the poison page is a clean page and it's dropped,
> then this modification should be appropriate. With this change, the callers never send SIGBUS now.
> They might need to be changed too.
Note there shouldn't be a SIGBUS when the action was "dropping a clean page". Full recovery
is possible in this case (user process takes #PF, Linux allocates a new page and fills by reading
from storage).
-Tony