Re: poisoned pages do not play well in the buddy allocator

From: Naoya Horiguchi
Date: Mon Aug 26 2019 - 21:41:30 EST


On Mon, Aug 26, 2019 at 12:41:50PM +0200, Oscar Salvador wrote:
> Hi,
>
> When analyzing a problem reported by one of our customers, I stumbbled upon an issue
> that origins from the fact that poisoned pages end up in the buddy allocator.
>
> Let me break down the stepts that lie to the problem:
>
> 1) We soft-offline a page
> 2) Page gets flagged as HWPoison and is being sent to the buddy allocator.
> This is done through set_hwpoison_free_buddy_page().
> 3) Kcompactd wakes up in order to perform some compaction.
> 4) compact_zone() will call migrate_pages()
> 5) migrate_pages() will try to get a new page from compaction_alloc() to migrate to
> 6) if cc->freelist is empty, compaction_alloc() will call isolate_free_pagesblock()
> 7) isolate_free_pagesblock only checks for PageBuddy() to assume that a page is OK
> to be used to migrate to. Since HWPoisoned page are also PageBuddy, we add
> the page to the list. (same problem exists in fast_isolate_freepages()).
>
> The outcome of that is that we end up happily handing poisoned pages in compaction_alloc,
> so if we ever got a fault on that page through *_fault, we will return VM_FAULT_HWPOISON,
> and the process will be killed.
>
> I first though that I could get away with it by checking PageHWPoison in
> {fast_isolate_freepages/isolate_free_pagesblock}, but not really.
> It might be that the page we are checking is an order > 0 page, so the first page
> might not be poisoned, but the one the follows might be, and we end up in the
> same situation.

Yes, this is a whole point of the current implementation.

>
> After some more thought, I really came to the conclusion that HWPoison pages should not
> really be in the buddy allocator, as this is only asking for problems.
> In this case it is only compaction code, but it could be happening somewhere else,
> and one would expect that the pages you got from the buddy allocator are __ready__ to use.
>
> I __think__ that we thought we were safe to put HWPoison pages in the buddy allocator as we
> perform healthy checks when getting a page from there, so we skip poisoned pages
>
> Of course, this is not the end of the story, now that someone got a page, if he frees it,
> there is a high chance that this page ends up in a pcplist (I saw that).
> Unless we are on CONFIG_VM_DEBUG, we do not check for the health of pages got from pcplist,
> as we do when getting a page from the buddy allocator.
>
> I checked [1], and it seems that [2] was going towards fixing this kind of issue.
>
> I think it is about time to revamp the whole thing.
>
> @Naoya: I could give it a try if you are busy.

Thanks for raising hand. That's really wonderful. I think that the series [1] is not
merge yet but not rejected yet, so feel free to reuse/update/revamp it.

>
> [1] https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@xxxxxxxxxxxxx/
> [2] https://lore.kernel.org/linux-mm/1541746035-13408-9-git-send-email-n-horiguchi@xxxxxxxxxxxxx/
>
> --
> Oscar Salvador
> SUSE L3
>