Re: [PATCH] Revert "mm/gup: check page posion status for coredump."

From: Michal Hocko
Date: Thu May 06 2021 - 03:02:58 EST


On Thu 06-05-21 13:47:50, Aili Yao wrote:
> On Wed, 5 May 2021 15:54:07 +0200
> Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> > From: Michal Hocko <mhocko@xxxxxxxx>
> >
> > While reviewing http://lkml.kernel.org/r/20210429122519.15183-4-david@xxxxxxxxxx
> > I have crossed d3378e86d182 ("mm/gup: check page posion status for
> > coredump.") and noticed that this patch is broken in two ways. First it
> > doesn't really prevent hwpoison pages from being dumped because hwpoison
> > pages can be marked asynchornously at any time after the check.
>
> I rethink this:
> There are two cases for this coredump panic issue.
> One is the scenario that the hwpoison flag is set correctly, and the previous patch
> will make it recoverable and avoid panic.
>
> Another is the hwpoison flag not valid in the check, maybe race condition. I don't think
> this case is worth and reliazable to be covered. As the SRAR can happen freshly in the dump
> process and thus can't be detected.
>
> And the previous patch doesn't make the Another case worse and unacceptable. just as it can't be
> covered.
>
> So here is the patch:
> For most case in this topic, the patch will work. For the case hwpoison flag not valid, it will
> fallback to the original process before this patch --- just panic.

Please propose a new fix which a) doesn't leak a page reference b)
evaluates how realistic is the scenario c) explain why any other gup
user doesn't really need to care - or in other words is the gup layer
really suitable for this issue?
--
Michal Hocko
SUSE Labs