Re: [PATCH v1 3/7] mm: rename and move page_is_poisoned()

From: Michal Hocko
Date: Thu May 06 2021 - 04:14:01 EST


On Thu 06-05-21 15:28:05, Aili Yao wrote:
> On Thu, 6 May 2021 09:06:14 +0200
> Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> > On Thu 06-05-21 08:56:11, Aili Yao wrote:
> > > On Wed, 5 May 2021 15:27:39 +0200
> > > Michal Hocko <mhocko@xxxxxxxx> wrote:
[...]
> > > > I am not sure I follow. My point is that I fail to see any added value
> > > > of the check as it doesn't prevent the race (it fundamentally cannot as
> > > > the page can be poisoned at any time) but the failure path doesn't
> > > > put_page which is incorrect even for hwpoison pages.
> > >
> > > Sorry, I have something to say:
> > >
> > > I have noticed the ref count leak in the previous topic ,but I don't think
> > > it's a really matter. For memory recovery case for user pages, we will keep one
> > > reference to the poison page so the error page will not be freed to buddy allocator.
> > > which can be checked in memory_faulure() function.
> >
> > So what would happen if those pages are hwpoisoned from userspace rather
> > than by HW. And repeatedly so?
>
> Sorry, I may be not totally understand what you mean.
>
> Do you mean hard page offline from mcelog?

No I mean soft hwpoison from userspace (e.g. by MADV_HWPOISON but there
are other interfaces AFAIK).

And just to be explicit. All those interfaces are root only
(CAP_SYS_ADMIN) so I am not really worried about any malitious abuse of
the reference leak. I am mostly concerned that this is obviously broken
without a good reason. The most trivial fix would have been to put_page
in the return path but as I've mentioned in other email thread the fix
really needs a deeper thought and consider other things.

Hope that clarifies this some more.
--
Michal Hocko
SUSE Labs