RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

From: Liang, Liang (Leo)
Date: Tue Mar 16 2021 - 04:43:59 EST


[AMD Public Use]

Hi David,

Thanks for your explanation. We saw slow boot issue on our farm/QA's machines and mine. All of machines are same SoC/board.

BRs,
Leo
-----Original Message-----
From: David Hildenbrand <david@xxxxxxxxxx>
Sent: Tuesday, March 16, 2021 4:38 PM
To: Liang, Liang (Leo) <Liang.Liang@xxxxxxx>; Mike Rapoport <rppt@xxxxxxxxxxxxx>
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; amd-gfx list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Rafael J. Wysocki <rafael@xxxxxxxxxx>; George Kennedy <george.kennedy@xxxxxxxxxx>
Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

On 16.03.21 09:00, Liang, Liang (Leo) wrote:
> [AMD Public Use]
>
> Hi Mike,
>
> Thanks for help. The patch works for me and boot time back to normal. So it's a fix, or just WA?

Hi Leo,

excluding up to 16 MiB of memory on every system just because that single platform is weird is not acceptable.

I think we have to figure out

a) why that memory is so special. This is weird.
b) why the platform doesn't indicate it in a special way. Why is it ordinary system RAM but still *that* slow?
c) how we can reliably identify such memory and exclude it.

I'll have a peek at the memory layout of that machine from boot logs next to figure out if we can answer any of these questions.

Just to verify: this does happen on multiple machines, not just a single one? (i.e., we're not dealing with faulty RAM)

--
Thanks,

David / dhildenb