Re: "mm: use early_pfn_to_nid in page_ext_init" broken on some configurations?
From: Vlastimil Babka
Date: Fri Jul 14 2017 - 05:34:38 EST
On 07/14/2017 11:13 AM, Michal Hocko wrote:
> On Fri 07-07-17 14:00:03, Vlastimil Babka wrote:
>> On 07/04/2017 07:17 AM, Joonsoo Kim wrote:
>>>> Still, backporting b8f1a75d61d8 fixes this:
>>>> [ 1.538379] allocated 738197504 bytes of page_ext
>>>> [ 1.539340] Node 0, zone DMA: page owner found early allocated 0 pages
>>>> [ 1.540179] Node 0, zone DMA32: page owner found early allocated 33 pages
>>>> [ 1.611173] Node 0, zone Normal: page owner found early allocated 96755 pages
>>>> [ 1.683167] Node 1, zone Normal: page owner found early allocated 96575 pages
>>>> No panic, notice how it allocated more for page_ext, and found smaller number of
>>>> early allocated pages.
>>>> Now backporting fe53ca54270a on top:
>>>> [ 0.000000] allocated 738197504 bytes of page_ext
>>>> [ 0.000000] Node 0, zone DMA: page owner found early allocated 0 pages
>>>> [ 0.000000] Node 0, zone DMA32: page owner found early allocated 33 pages
>>>> [ 0.000000] Node 0, zone Normal: page owner found early allocated 2842622 pages
>>>> [ 0.000000] Node 1, zone Normal: page owner found early allocated 3694362 pages
>>>> Again no panic, and same amount of page_ext usage. But the "early allocated" numbers
>>>> seem bogus to me. I think it's because init_pages_in_zone() is running and inspecting
>>>> struct pages that have not been yet initialized. It doesn't end up crashing, but
>>>> still doesn't seem correct?
>>> Numbers looks sane to me. fe53ca54270a makes init_pages_in_zone()
>>> called before page_alloc_init_late(). So, there would be many
>>> uninitialized pages with PageReserved(). Page owner regarded these
>>> PageReserved() page as allocated page.
>> That seems incorrect for two reasons:
>> - init_pages_in_zone() actually skips PageReserved() pages
>> - the pages don't have PageReserved() flag, until the deferred struct page init
>> thread processes them via deferred_init_memmap() -> __init_single_page() AFAICS
>> Now I've found out why upstream reports much less early allocated pages than our
>> kernel. We're missing 9d43f5aec950 ("mm/page_owner: add zone range overlapping
>> check") which adds a "page_zone(page) != zone" check. I think this only works
>> because the pages are not initialized and thus have no nid/zone links. Probably
>> page_zone() only doesn't break because it's all zeroed. I don't think it's safe
>> to rely on this?
> Yes, if anything PageReserved should be checked before the zone check.
That wouldn't change anything, because we skip PageReserved and it's not
set. Perhaps we could skip pages that have the raw page flags value
zero, but then a) we should make sure that the allocation of the struct
page array zeroes the range, and b) the first modification of struct
page in the initialization is setting the PageReserved flag.