Re: [PATCH v5 04/18] sparc32: mm: Reduce allocation size for PMD and PTE tables

From: Mike Rapoport
Date: Wed May 20 2020 - 13:03:20 EST


On Mon, May 18, 2020 at 09:37:15AM +0100, Will Deacon wrote:
> On Sat, May 16, 2020 at 05:07:50PM -0700, Guenter Roeck wrote:
> > On Sat, May 16, 2020 at 05:00:50PM -0700, Guenter Roeck wrote:
> > > On Mon, May 11, 2020 at 09:41:36PM +0100, Will Deacon wrote:
> > > > Now that the page table allocator can free page table allocations
> > > > smaller than PAGE_SIZE, reduce the size of the PMD and PTE allocations
> > > > to avoid needlessly wasting memory.
> > > >
> > > > Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
> > > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > > Signed-off-by: Will Deacon <will@xxxxxxxxxx>
> > >
> > > Something in the sparc32 patches in linux-next causes all my sparc32 emulations
> > > to crash. bisect points to this patch, but reverting it doesn't help, and neither
> > > does reverting the rest of the series.
> > >
> > Actually, turns out I see the same pattern (lots of scheduling while atomic
> > followed by 'killing interrupt handler' in cryptomgr_test) with several
> > powerpc boot tests. I am currently bisecting those crashes. I'll report
> > the results here as well as soon as I have it.
>
> FWIW, I retested my sparc32 patches with PREEMPT=y and I don't see any
> issues. However, linux-next is a different story, where I don't get very far
> at all:
>
> BUG: Bad page state in process swapper pfn:005b4

This is caused by c03584e30534 ("mm: memmap_init: iterate over memblock
regions rather that check each PFN"). The commit sha is valid for
v5.7-rc6-mmots-2020-05-19-21-52, so it will change in a day or so :)

As it seems, sparc32 never registered the memory occupied by the kernel
image with memblock_add() and it only reserves this memory with
meblock_reserve().

I don't know what would happen on real HW, but with

qemu-system-sparc -kernel /path/to/kernel

the memory occupied by the kernel is reserved in openbios and removed
from mem.available. The prom setup code in the kernel used mem.available
to set up the memory banks and essentially there is a hole for the
memory occupied by the kernel.

Later in bootmem_init() this memory is memblock_reserve()d.

Before the problematic commit, memmap initialization would call
__init_single_page() for the pages in that hole, the
free_low_memory_core_early() would mark them as resrved and everything
would be Ok.

After the change in memmap initialization, the hole is skipped and the
page structs for it are not inited. And when they are passed from
memblock to page allocator as reserved it gets confused.

Simply registering the memory occupied by the kernel with memblock_add()
resolves this issue, at least for qemu-system-arm and I cannot see how
it can harm any other setup.

If all that makes sense I'll send a proper patch :)

diff --git a/arch/sparc/mm/init_32.c b/arch/sparc/mm/init_32.c
index 906eda1158b4..3cb3dffcbcdc 100644
--- a/arch/sparc/mm/init_32.c
+++ b/arch/sparc/mm/init_32.c
@@ -193,6 +193,7 @@ unsigned long __init bootmem_init(unsigned long *pages_avail)
/* Reserve the kernel text/data/bss. */
size = (start_pfn << PAGE_SHIFT) - phys_base;
memblock_reserve(phys_base, size);
+ memblock_add(phys_base, size);

size = memblock_phys_mem_size() - memblock_reserved_size();
*pages_avail = (size >> PAGE_SHIFT) - high_pages;

> Will

--
Sincerely yours,
Mike.