Re: [PATCH v5 1/1] mm: refactor initialization of struct page for holes in memory layout
From: Vlastimil Babka
Date: Tue Feb 16 2021 - 11:40:22 EST
On 2/16/21 2:11 PM, Michal Hocko wrote:
> On Tue 16-02-21 13:34:56, Vlastimil Babka wrote:
>> On 2/16/21 12:01 PM, Mike Rapoport wrote:
>> >>
>> >> I do understand that. And I am not objecting to the patch. I have to
>> >> confess I haven't digested it yet. Any changes to early memory
>> >> intialization have turned out to be subtle and corner cases only pop up
>> >> later. This is almost impossible to review just by reading the code.
>> >> That's why I am asking whether we want to address the specific VM_BUG_ON
>> >> first with something much less tricky and actually reviewable. And
>> >> that's why I am asking whether dropping the bug_on itself is safe to do
>> >> and use as a hot fix which should be easier to backport.
>> >
>> > I can't say I'm familiar enough with migration and compaction code to say
>> > if it's ok to remove that bug_on. It does point to inconsistency in the
>> > memmap, but probably it's not important.
>>
>> On closer look, removing the VM_BUG_ON_PAGE() in set_pfnblock_flags_mask() is
>> not safe. If we violate the zone_spans_pfn condition, it means we will write
>> outside of the pageblock bitmap for the zone, and corrupt something.
>
> Isn't it enough that at least some pfn from the pageblock belongs to the
> zone in order to have the bitmap allocated for the whole page block
> (even if it partially belongs to a different zone)?
>
>> Actually
>> similar thing can happen in __get_pfnblock_flags_mask() where there's no
>> VM_BUG_ON, but there we can't corrupt memory. But we could theoretically fault
>> to do accessing some unmapped range?
>>
>> So the checks would have to become unconditional !DEBUG_VM and return instead of
>> causing a BUG. Or we could go back one level and add some checks to
>> fast_isolate_around() to detect a page from zone that doesn't match cc->zone.
>
> Thanks for looking deeper into that. This sounds like a much more
> targeted fix to me.
So, Andrea could you please check if this fixes the original
fast_isolate_around() issue for you? With the VM_BUG_ON not removed, DEBUG_VM
enabled, no changes to struct page initialization...
It relies on pageblock_pfn_to_page as the rest of the compaction code.
Thanks!
----8<----