Re: [PATCH] mm/memblock:use a more appropriate order calculation when free memblock pages

From: Qian Cai
Date: Fri Dec 04 2020 - 08:44:26 EST


On Thu, 2020-12-03 at 23:23 +0800, carver4lio@xxxxxxx wrote:
> From: Hailong Liu <liu.hailong6@xxxxxxxxxx>
>
> When system in the booting stage, pages span from [start, end] of a memblock
> are freed to buddy in a order as large as possible (less than MAX_ORDER) at
> first, then decrease gradually to a proper order(less than end) in a loop.
>
> However, *min(MAX_ORDER - 1UL, __ffs(start))* can not get the largest order
> in some cases.
> Instead, *__ffs(end - start)* may be more appropriate and meaningful.
>
> Signed-off-by: Hailong Liu <liu.hailong6@xxxxxxxxxx>

Reverting this commit on the top of today's linux-next fixed boot crashes on
multiple NUMA systems.

[ 5.050736][ T0] flags: 0x3fffc000000000()
[ 5.055103][ T0] raw: 003fffc000000000 ffffea0000000448 ffffea0000000448 0000000000000000
[ 5.063572][ T0] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 5.072045][ T0] page dumped because: VM_BUG_ON_PAGE(pfn & ((1 << order) - 1))
[ 5.079580][ T0] ------------[ cut here ]------------
[ 5.084883][ T0] kernel BUG at mm/page_alloc.c:1015!
[ 5.090151][ T0] invalid opcode: 0000 [#1] SMP KASAN NOPTI
[ 5.095894][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc6-next-20201204+ #11
[ 5.104099][ T0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[ 5.113370][ T0] RIP: 0010:__free_one_page+0xa19/0x1140
[ 5.118864][ T0] Code: d2 e9 69 f6 ff ff 0f 0b 48 c7 c6 e0 52 2d a5 4c 89 ff e8 7a 98 f8 ff 0f 0b 0f 0b 48 c7 c6 60 53 2d a5 4c 89 ff e8 67 98 f8 ff <0f> 0b 48 c7 c6 c0 53 2d a5 4c 89 ff e8 56 98 f8 ff 0f 0b 48 89 da
[ 5.138427][ T0] RSP: 0000:ffffffffa5807c30 EFLAGS: 00010086
[ 5.144367][ T0] RAX: 0000000000000000 RBX: 0000000000000008 RCX: ffffffffa3c4abf4
[ 5.152228][ T0] RDX: 1ffffd400000008f RSI: 0000000000000000 RDI: ffffea0000000478
[ 5.160091][ T0] RBP: 0000000000000007 R08: fffffbfff5918fc5 R09: fffffbfff5918fc5
[ 5.167951][ T0] R10: ffffffffac8c7e23 R11: fffffbfff5918fc4 R12: 0000000000000000
[ 5.175815][ T0] R13: 0000000000000003 R14: ffff88887fff6000 R15: ffffea0000000440
[ 5.183677][ T0] FS: 0000000000000000(0000) GS:ffff88881e800000(0000) knlGS:0000000000000000
[ 5.192499][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.198963][ T0] CR2: ffff88907efff000 CR3: 0000000ce3e14000 CR4: 00000000000406b0
[ 5.206823][ T0] Call Trace:
[ 5.209978][ T0] ? rwlock_bug.part.1+0x90/0x90
[ 5.214774][ T0] free_one_page+0x7e/0x1e0
[ 5.219142][ T0] __free_pages_ok+0x646/0x13b0
[ 5.223863][ T0] memblock_free_all+0x21c/0x2c0
(inlined by) __free_memory_core at mm/memblock.c:2037
(inlined by) free_low_memory_core_early at mm/memblock.c:2060
(inlined by) memblock_free_all at mm/memblock.c:2100
[ 5.228662][ T0] ? reset_all_zones_managed_pages+0x9a/0x9a
[ 5.234515][ T0] ? memblock_alloc_try_nid+0xe6/0x127
[ 5.239842][ T0] ? memblock_alloc_try_nid_raw+0x12a/0x12a
[ 5.245610][ T0] ? early_amd_iommu_init+0x1e1f/0x1e1f
[ 5.251024][ T0] ? iommu_go_to_state+0x24/0x28
[ 5.255831][ T0] mem_init+0x1a/0x350
[ 5.259762][ T0] mm_init+0x5f/0x87
[ 5.263515][ T0] start_kernel+0x14c/0x3a7
[ 5.267882][ T0] ? copy_bootdata+0x19/0x47
[ 5.272340][ T0] secondary_startup_64_no_verify+0xc2/0xcb
[ 5.278102][ T0] Modules linked in:
[ 5.281869][ T0] random: get_random_bytes called from print_oops_end_marker+0x26/0x40 with crng_init=0
[ 5.281878][ T0] ---[ end trace 32dd7228cc16af82 ]---
[ 5.296795][ T0] RIP: 0010:__free_one_page+0xa19/0x1140
[ 5.302299][ T0] Code: d2 e9 69 f6 ff ff 0f 0b 48 c7 c6 e0 52 2d a5 4c 89 ff e8 7a 98 f8 ff 0f 0b 0f 0b 48 c7 c6 60 53 2d a5 4c 89 ff e8 67 98 f8 ff <0f> 0b 48 c7 c6 c0 53 2d a5 4c 89 ff e8 56 98 f8 ff 0f 0b 48 89 da
[ 5.321864][ T0] RSP: 0000:ffffffffa5807c30 EFLAGS: 00010086
[ 5.327803][ T0] RAX: 0000000000000000 RBX: 0000000000000008 RCX: ffffffffa3c4abf4
[ 5.335665][ T0] RDX: 1ffffd400000008f RSI: 0000000000000000 RDI: ffffea0000000478
[ 5.343526][ T0] RBP: 0000000000000007 R08: fffffbfff5918fc5 R09: fffffbfff5918fc5
[ 5.351389][ T0] R10: ffffffffac8c7e23 R11: fffffbfff5918fc4 R12: 0000000000000000
[ 5.359249][ T0] R13: 0000000000000003 R14: ffff88887fff6000 R15: ffffea0000000440
[ 5.367110][ T0] FS: 0000000000000000(0000) GS:ffff88881e800000(0000) knlGS:0000000000000000
[ 5.375932][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.382397][ T0] CR2: ffff88907efff000 CR3: 0000000ce3e14000 CR4: 00000000000406b0
[ 5.390261][ T0] Kernel panic - not syncing: Fatal exception
[ 5.396320][ T0] ---[ end Kernel panic - not syncing: Fatal exception ]---

> ---
> mm/memblock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b68ee8678..7c6d0dde7 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1931,7 +1931,7 @@ static void __init __free_pages_memory(unsigned long
> start, unsigned long end)
> int order;
>
> while (start < end) {
> - order = min(MAX_ORDER - 1UL, __ffs(start));
> + order = min(MAX_ORDER - 1UL, __ffs(end - start));
>
> while (start + (1UL << order) > end)
> order--;