Re: 5.0-rc kernel hangs on early boot

From: Mel Gorman
Date: Wed Feb 13 2019 - 06:21:48 EST


On Wed, Feb 13, 2019 at 11:18:44AM +0000, Will Deacon wrote:
> Hi Yury,
>
> On Wed, Feb 13, 2019 at 11:25:40AM +0300, Yury Norov wrote:
> > My kernel on qemu/arm64 setup hangs at early boot since v5.0-rc1.
> > Backtrace is not too verbose:
> > (gdb) i threads
> > Id Target Id Frame
> > * 1 Thread 1 (CPU#0 [running]) 0xffff000010a49b74 in __delay (cycles=4096)
> > at arch/arm64/lib/delay.c:49
> > 2 Thread 2 (CPU#1 [halted ]) 0x0000000000000000 in ?? ()
> > 3 Thread 3 (CPU#2 [halted ]) 0x0000000000000000 in ?? ()
> > 4 Thread 4 (CPU#3 [halted ]) 0x0000000000000000 in ?? ()
> > (gdb) bt
> > #0 0xffff000010a49b74 in __delay (cycles=4096) at arch/arm64/lib/delay.c:49
> > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> >
> > Reverting the patch
> > 1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
> > fragmentation event occurs") together with following patch
> > 73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
> > helps me to boot normally.
> >
> > Some system information is below, and config is attached.
>
> FWIW, running with your command-line and .config under KVM with earlycon
> leads to an early page allocation failure followed by a NULL dereference
> during boot if only 1G is configured (log below). For the mm folks, it's
> probably worth pointing out that you're using 64k pages.
>

Thanks Will.

While I agree that going OOM early is a problem and would explain why
the boosting logic was hit at all, it's still the case that the boosting
should not divide by zero. Even if the booting is broken due to a lack
of memory, I'd still not prefer to crash due to 1c30844d2dfe272d58c.

--
Mel Gorman
SUSE Labs