Re: [PATCH v1 3/7] mm: rename and move page_is_poisoned()

From: Aili Yao
Date: Thu May 06 2021 - 04:52:40 EST


On Thu, 6 May 2021 09:55:51 +0200
Michal Hocko <mhocko@xxxxxxxx> wrote:

> On Thu 06-05-21 15:28:05, Aili Yao wrote:
> > On Thu, 6 May 2021 09:06:14 +0200
> > Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > > On Thu 06-05-21 08:56:11, Aili Yao wrote:
> > > > On Wed, 5 May 2021 15:27:39 +0200
> > > > Michal Hocko <mhocko@xxxxxxxx> wrote:
> [...]
> > > > > I am not sure I follow. My point is that I fail to see any added value
> > > > > of the check as it doesn't prevent the race (it fundamentally cannot as
> > > > > the page can be poisoned at any time) but the failure path doesn't
> > > > > put_page which is incorrect even for hwpoison pages.
> > > >
> > > > Sorry, I have something to say:
> > > >
> > > > I have noticed the ref count leak in the previous topic ,but I don't think
> > > > it's a really matter. For memory recovery case for user pages, we will keep one
> > > > reference to the poison page so the error page will not be freed to buddy allocator.
> > > > which can be checked in memory_faulure() function.
> > >
> > > So what would happen if those pages are hwpoisoned from userspace rather
> > > than by HW. And repeatedly so?
> >
> > Sorry, I may be not totally understand what you mean.
> >
> > Do you mean hard page offline from mcelog?
>
> No I mean soft hwpoison from userspace (e.g. by MADV_HWPOISON but there
> are other interfaces AFAIK).
>
> And just to be explicit. All those interfaces are root only
> (CAP_SYS_ADMIN) so I am not really worried about any malitious abuse of
> the reference leak. I am mostly concerned that this is obviously broken
> without a good reason. The most trivial fix would have been to put_page
> in the return path but as I've mentioned in other email thread the fix
> really needs a deeper thought and consider other things.
>
> Hope that clarifies this some more.

Thanks, got it!
Yes, there are some test scenarios that should be covered.

But for test, the default SIGBUS handlers is usually replaced, and the test process
may not hit the coredump code.

Anyway, there is a ref leak in the normal enviorments and better to be fixed.

Thanks!
Aili Yao