Re: [RFC PATCH v2 10/16] mm,hwpoison: Rework soft offline for free pages

From: Oscar Salvador
Date: Tue Oct 22 2019 - 03:46:27 EST


On Mon, Oct 21, 2019 at 05:41:58PM +0200, Michal Hocko wrote:
> On Mon 21-10-19 14:58:49, Oscar Salvador wrote:
> > Nothing prevents the page to be allocated in the meantime.
> > We would just bail out and return -EBUSY to userspace.
> > Since we do not do __anything__ to the page until we are sure we took it off,
> > and it is completely isolated from the memory, there is no danger.
>
> Wouldn't it be better to simply check the PageBuddy state after the lock
> has been taken?

We already do that:

bool take_page_off_buddy(struct page *page)
{
...
spin_lock_irqsave(&zone->lock, flags);
for (order = 0; order < MAX_ORDER; order++) {
struct page *page_head = page - (pfn & ((1 << order) - 1));
int buddy_order = page_order(page_head);
struct free_area *area = &(zone->free_area[buddy_order]);

if (PageBuddy(page_head) && buddy_order >= order) {
...
}

Actually, we __only__ call break_down_buddy_pages() (which breaks down a higher-order page
and keeps our page out of buddy) if that is true.

> > Since soft-offline is kinda "best effort" mode, it is something like:
> > "Sorry, could not poison the page, try again".
>
> Well, I would disagree here. While madvise is indeed a best effort
> operation please keep in mind that the sole purpose of this interface is
> to allow real MCE behavior. And that operation should better try
> _really_ hard to make sure we try to recover as gracefully as possible.

In this case, there is nothing to be recovered from.
If we wanted to soft-offline a page that was free, and then it was allocated
in the meantime, there is no harm in that as we do not flag the page
until we are sure it is under our control.
That means:

- for free pages: was succesfully taken off buddy
- in use pages: was freed or migrated

So, opposite to hard-offline, in soft-offline we do not fiddle with pages
unless we are sure the page is not reachable anymore by any means.

> > Now, thinking about this a bit more, I guess we could be more clever here
> > and call the routine that handles in-use pages if we see that the page
> > was allocated by the time we reach take_page_off_buddy.
> >
> > About pcp pages, you are right.
> > I thought that we were already handling that case, and we do, but looking closer the
> > call to shake_page() (that among other things spills pcppages into buddy)
> > is performed at a later stage.
> > I think we need to adjust __get_any_page to recognize pcp pages as well.
>
> Yeah, pcp pages are PITA. We cannot really recognize them now. Dropping
> all pcp pages is certainly a way to go but we need to mark the page
> before that happens.

I will work on that.

--
Oscar Salvador
SUSE L3