Re: [PATCH v2] Revert "mm/page_alloc: fix memmap_init_zone pageblock alignment"
From: Ard Biesheuvel
Date: Thu Mar 15 2018 - 11:48:53 EST
On 15 March 2018 at 15:34, Daniel Vacek <neelx@xxxxxxxxxx> wrote:
> On Thu, Mar 15, 2018 at 4:17 PM, Ard Biesheuvel
> <ard.biesheuvel@xxxxxxxxxx> wrote:
>> On 15 March 2018 at 15:12, Daniel Vacek <neelx@xxxxxxxxxx> wrote:
>>> On Thu, Mar 15, 2018 at 8:45 AM, Ard Biesheuvel
>>> <ard.biesheuvel@xxxxxxxxxx> wrote:
>>>> On 15 March 2018 at 07:44, Daniel Vacek <neelx@xxxxxxxxxx> wrote:
>>>>> On Thu, Mar 15, 2018 at 8:36 AM, Ard Biesheuvel
>>>>> <ard.biesheuvel@xxxxxxxxxx> wrote:
>>>>>> On 15 March 2018 at 02:23, Daniel Vacek <neelx@xxxxxxxxxx> wrote:
>>>>>>> On Wed, Mar 14, 2018 at 8:29 PM, Ard Biesheuvel
>>>>>>> <ard.biesheuvel@xxxxxxxxxx> wrote:
>>>>>>>> This reverts commit 864b75f9d6b0100bb24fdd9a20d156e7cda9b5ae.
>>>>>>>>
>>>>>>>> Commit 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock
>>>>>>>> alignment") modified the logic in memmap_init_zone() to initialize
>>>>>>>> struct pages associated with invalid PFNs, to appease a VM_BUG_ON()
>>>>>>>> in move_freepages(), which is redundant by its own admission, and
>>>>>>>> dereferences struct page fields to obtain the zone without checking
>>>>>>>> whether the struct pages in question are valid to begin with.
>>>>>>>>
>>>>>>>> Commit 864b75f9d6b0 only makes it worse, since the rounding it does
>>>>>>>> may cause pfn assume the same value it had in a prior iteration of
>>>>>>>> the loop, resulting in an infinite loop and a hang very early in the
>>>>>>>> boot. Also, since it doesn't perform the same rounding on start_pfn
>>>>>>>> itself but only on intermediate values following an invalid PFN, we
>>>>>>>> may still hit the same VM_BUG_ON() as before.
>>>>>>>>
>>>>>>>> So instead, let's fix this at the core, and ensure that the BUG
>>>>>>>> check doesn't dereference struct page fields of invalid pages.
>>>>>>>>
>>>>>>>> Fixes: 864b75f9d6b0 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>>>>>>>> Cc: Daniel Vacek <neelx@xxxxxxxxxx>
>>>>>>>> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
>>>>>>>> Cc: Michal Hocko <mhocko@xxxxxxxx>
>>>>>>>> Cc: Paul Burton <paul.burton@xxxxxxxxxx>
>>>>>>>> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>
>>>>>>>> Cc: Vlastimil Babka <vbabka@xxxxxxx>
>>>>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>>>>>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
>>>>>>>> ---
>>>>>>>> mm/page_alloc.c | 13 +++++--------
>>>>>>>> 1 file changed, 5 insertions(+), 8 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>>> index 3d974cb2a1a1..635d7dd29d7f 100644
>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>> @@ -1910,7 +1910,9 @@ static int move_freepages(struct zone *zone,
>>>>>>>> * Remove at a later date when no bug reports exist related to
>>>>>>>> * grouping pages by mobility
>>>>>>>> */
>>>>>>>> - VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
>>>>>>>> + VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
>>>>>>>> + pfn_valid(page_to_pfn(end_page)) &&
>>>>>>>> + page_zone(start_page) != page_zone(end_page));
>>>>>>>
>>>>>>> Hi, I am on vacation this week and I didn't have a chance to test this
>>>>>>> yet but I am not sure this is correct. Generic pfn_valid() unlike the
>>>>>>> arm{,64} arch specific versions returns true for all pfns in a section
>>>>>>> if there is at least some memory mapped in that section. So I doubt
>>>>>>> this prevents the crash I was targeting. I believe pfn_valid() does
>>>>>>> not change a thing here :(
>>>>>>>
>>>>>>
>>>>>> If this is the case, memblock_next_valid_pfn() is broken since it
>>>>>> skips valid PFNs, and we should be fixing that instead.
>>>>>
>>>>> How do you define valid pfn? Maybe the generic version of pfn_valid()
>>>>> should be fixed???
>>>>>
>>>>
>>>> memblock_next_valid_pfn() skips PFNs for which pfn_valid() returns
>>>> true. That is clearly a bug.
>>>
>>> So pfn_valid() does not mean this frame is usable memory?
>>>
>>
>> Who cares what it *means*?
>
> abstractions?
>
>> memblock_next_valid_pfn() has 'valid_pfn' in its name, so if passing
>> pfn A returns B, and there exists a C such that A < C < B and
>> pfn_valid(C) returns true, memblock_next_valid_pfn doesn't do what it
>> says on the tin and should be fixed.
>
> If you don't like the name I would argue it should be changed to
> something like memblock_pfn_of_next_memory(). Still the name is not
> next_valid_pfn() but memblock_next... kind of distinguishing something
> different.
>
Naming is important. If the name would have reflected what it actually
does, perhaps it would have taken us less time to figure out that what
it's doing is wrong.
>> You keep going on about how pfn_valid() does or does not do what you
>> think, but that is really irrelevant.
>
> I'm trying to learn.
So am I :-)
> Hence I was asking what is the abstract meaning
> of it. As I see two *way_different* implementations so I am not sure
> how I should understand that.
>
My interpretation is that it has a struct page associated with it, but
it seems the semantics of pfn_valid() aren't well defined. IIRC there
are places in the kernel that assume that valid PFNs are covered by
the kernel direct mapping (on non-highmem kernels), and this is why we
have a separate definition for arm64, which needs to remove some
regions from the kernel direct mapping because the architecture does
not permit mappings with mismatched attributes, and these regions may
be mapped by the firmware as well.
Dereferencing the struct page associated with a PFN for which
pfn_valid() returns false is wrong in any case, which is why the
VM_BUG_ON() you identified was buggy as well. But on the other hand,
that does mean we need to guarantee that all valid PFNs have their
struct page initialized.
>>> OK, in that case we need to fix or revert memblock_next_valid_pfn().
>>> That works for me.
>>>
>>
>> OK. You can add my ack to a patch that reverts it, and we can revisit
>> it for the next cycle.
>
> Cool. I'll do that next week. Thank you, sir.
>
Likewise.