Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone
From: Michal Hocko
Date: Thu Mar 15 2018 - 07:51:04 EST
On Thu 15-03-18 02:30:41, Daniel Vacek wrote:
> On Wed, Mar 14, 2018 at 3:17 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
> >> On some architectures (reported on arm64) commit 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
> >> causes a boot hang. This patch fixes the hang making sure the alignment
> >> never steps back.
> >
> > I am sorry to be complaining again, but the code is so obscure that I
>
> No worries, I'm glad for any review. Which code exactly you do find
> obscure? This patch or my former fix or the original commit
> introducing memblock_next_valid_pfn()? Coz I'd agree the original
> commit looks pretty obscure...
As mentioned in the other email, the whole going back and forth in the
same loop is just too ugly to live.
> > would _really_ appreciate some more information about what is going
> > on here. memblock_next_valid_pfn will most likely return a pfn within
> > the same memblock and the alignment will move it before the old pfn
> > which is not valid - so the block has some holes. Is that correct?
>
> I do not understand what you mean by 'pfn within the same memblock'?
Sorry, I should have said in the same pageblock
> And by 'the block has some holes'?
memblock_next_valid_pfn clearly returns pfn which is within a pageblock
and that is why we do not initialize pages in the begining of the block
while move_freepages_block does really expect the full pageblock to be
initialized properly. That is the fundamental problem, right?
> memblock has types 'memory' (as usable memory) and 'reserved' (for
> unusable mem), if I understand correctly.
We might not have struct pages for invalid pfns. That really depends on
the memory mode. Sure sparse mem model will usually allocate struct
pages for whole memory sections but that is not universally true and
adding such a suble assumption is simply wrong.
I suspect you are making strong assumptions based on a very specific
implementation which might be not true in general. That was the feeling
I've had since the patch was proposed for the first time. This is such a
cluttered area that I am not really sure myself, thoug.
--
Michal Hocko
SUSE Labs