Re: [mm 4.15-rc8] Random oopses under memory pressure.

From: Linus Torvalds
Date: Wed Jan 17 2018 - 16:39:55 EST


On Wed, Jan 17, 2018 at 3:08 AM, Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> I needed to bisect between 4.10 and 4.11, and I got plausible culprit.
> [...]
> git bisect bad b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e
> # first bad commit: [b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e] mm, page_alloc: Add missing check for memory holes

Ok, that is indeed much more likely, and very much matches the whole
"this problem only happens with sparsemem" issue.

In fact, the whole

pfn_valid_within(buddy_pfn)

test looks very odd. Maybe the pfn of the buddy is valid, but it's not
in the same zone? Then we'd combine the two pages in two different
zones into one combined page.

Maybe that's why HIGHMEM matters? The low DMA zone is obviously
aligned in the whole PAGE_ORDER range. But the highmem zone might not
be. I used to know the highmem code, but I've happily forgotten
everything. But I think we end up deciding on some random non-aligned
number in the 900MB range as being the limit between the regular zone
and the HIGHMEM zone.

So maybe something like this to test the theory?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 76c9688b6a0a..f919a5548943 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -756,6 +756,8 @@ static inline void rmv_page_order(struct page *page)
static inline int page_is_buddy(struct page *page, struct page *buddy,
unsigned int order)
{
+ if (WARN_ON_ONCE(page_zone(page) != page_zone(buddy)))
+ return 0;
if (page_is_guard(buddy) && page_order(buddy) == order) {
if (page_zone_id(page) != page_zone_id(buddy))
return 0;

I don't know. Does that warning trigger for you?

The above is completely untested. It might not compile. If it compiles
it might not work. And even if it "works", it might not matter,
because perhaps the boundary between regular memory and HIGHMEM is
already sufficiently aligned.

Comments?

Linus