Re: [PATCH] mm, page_alloc: check pfn is valid before moving to freelist

From: Mike Rapoport
Date: Wed Apr 13 2022 - 16:49:15 EST


On Tue, Apr 12, 2022 at 01:16:23PM -0700, Sudarshan Rajagopalan wrote:
> Check if pfn is valid before or not before moving it to freelist.
>
> There are possible scenario where a pageblock can have partial physical
> hole and partial part of System RAM. This happens when base address in RAM
> partition table is not aligned to pageblock size.
>
> Example:
>
> Say we have this first two entries in RAM partition table -
>
> Base Addr: 0x0000000080000000 Length: 0x0000000058000000
> Base Addr: 0x00000000E3930000 Length: 0x0000000020000000

I wonder what was done to memory DIMMs to get such an interesting
physical memory layout...

> ...
>
> Physical hole: 0xD8000000 - 0xE3930000
>
> On system having 4K as page size and hence pageblock size being 4MB, the
> base address 0xE3930000 is not aligned to 4MB pageblock size.
>
> Now we will have pageblock which has partial physical hole and partial part
> of System RAM -
>
> Pageblock [0xE3800000 - 0xE3C00000] -
> 0xE3800000 - 0xE3930000 -- physical hole
> 0xE3930000 - 0xE3C00000 -- System RAM
>
> Now doing __alloc_pages say we get a valid page with PFN 0xE3B00 from
> __rmqueue_fallback, we try to put other pages from the same pageblock as well
> into freelist by calling steal_suitable_fallback().
>
> We then search for freepages from start of the pageblock due to below code -
>
> move_freepages_block(zone, page, migratetype, ...)
> {
> pfn = page_to_pfn(page);
> start_pfn = pfn & ~(pageblock_nr_pages - 1);
> end_pfn = start_pfn + pageblock_nr_pages - 1;
> ...
> }
>
> With the pageblock which has partial physical hole at the beginning, we will
> run into PFNs from the physical hole whose struct page is not initialized and
> is invalid, and system would crash as we operate on invalid struct page to find
> out of page is in Buddy or LRU or not

struct page must be initialized and valid even for holes in the physical
memory. When a pageblock spans both existing memory and a hole, the struct
pages for the "hole" part should be marked as PG_Reserved.

If you see that struct pages for memory holes exist but invalid, we should
solve the underlying issue that causes wrong struct pages contents.

> [ 107.629453][ T9688] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [ 107.639214][ T9688] Mem abort info:
> [ 107.642829][ T9688] ESR = 0x96000006
> [ 107.646696][ T9688] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 107.652878][ T9688] SET = 0, FnV = 0
> [ 107.656751][ T9688] EA = 0, S1PTW = 0
> [ 107.660705][ T9688] FSC = 0x06: level 2 translation fault
> [ 107.666455][ T9688] Data abort info:
> [ 107.670151][ T9688] ISV = 0, ISS = 0x00000006
> [ 107.674827][ T9688] CM = 0, WnR = 0
> [ 107.678615][ T9688] user pgtable: 4k pages, 39-bit VAs, pgdp=000000098a237000
> [ 107.685970][ T9688] [0000000000000000] pgd=0800000987170003, p4d=0800000987170003, pud=0800000987170003, pmd=0000000000000000
> [ 107.697582][ T9688] Internal error: Oops: 96000006 [#1] PREEMPT SMP
>
> [ 108.209839][ T9688] pc : move_freepages_block+0x174/0x27c

can you post fadd2line for this address?

> [ 108.215407][ T9688] lr : steal_suitable_fallback+0x20c/0x398
>
> [ 108.305908][ T9688] Call trace:
> [ 108.309151][ T9688] move_freepages_block+0x174/0x27c [PageLRU]
> [ 108.314359][ T9688] steal_suitable_fallback+0x20c/0x398
> [ 108.319826][ T9688] rmqueue_bulk+0x250/0x934
> [ 108.324325][ T9688] rmqueue_pcplist+0x178/0x2ac
> [ 108.329086][ T9688] rmqueue+0x5c/0xc10
> [ 108.333048][ T9688] get_page_from_freelist+0x19c/0x430
> [ 108.338430][ T9688] __alloc_pages+0x134/0x424
> [ 108.343017][ T9688] page_cache_ra_unbounded+0x120/0x324
> [ 108.348494][ T9688] do_sync_mmap_readahead+0x1b0/0x234
> [ 108.353878][ T9688] filemap_fault+0xe0/0x4c8
> [ 108.358375][ T9688] do_fault+0x168/0x6cc
> [ 108.362518][ T9688] handle_mm_fault+0x5c4/0x848
> [ 108.367280][ T9688] do_page_fault+0x3fc/0x5d0
> [ 108.371867][ T9688] do_translation_fault+0x6c/0x1b0
> [ 108.376985][ T9688] do_mem_abort+0x68/0x10c
> [ 108.381389][ T9688] el0_ia+0x50/0xbc
> [ 108.385175][ T9688] el0t_32_sync_handler+0x88/0xbc
> [ 108.390208][ T9688] el0t_32_sync+0x1b8/0x1bc
>
> Hence, avoid operating on invalid pages within the same pageblock by checking
> if pfn is valid or not.

> Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja@xxxxxxxxxxx>
> Fixes: 4c7b9896621be ("mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE")
> Cc: Mike Rapoport <rppt@xxxxxxxxxxxxx>

For now the patch looks like a band-aid for more fundamental bug, so

NAKED-by: Mike Rapoport <rppt@xxxxxxxxxxxxx>


> Cc: Anshuman Khandual <anshuman.khandual@xxxxxxx>
> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> ---
> mm/page_alloc.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6e5b448..e87aa053 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2521,6 +2521,11 @@ static int move_freepages(struct zone *zone,
> int pages_moved = 0;
>
> for (pfn = start_pfn; pfn <= end_pfn;) {
> + if (!pfn_valid(pfn)) {
> + pfn++;
> + continue;
> + }
> +
> page = pfn_to_page(pfn);
> if (!PageBuddy(page)) {
> /*
> --
> 2.7.4
>

--
Sincerely yours,
Mike.