Re: 5.1 and 5.1.1: BUG: unable to handle kernel paging request at ffffea0002030000

From: Oleksandr Natalenko
Date: Fri May 24 2019 - 07:46:34 EST


On Tue, May 21, 2019 at 01:43:10PM +0100, Mel Gorman wrote:
> On Tue, May 21, 2019 at 05:01:06AM -0400, Justin Piszcz wrote:
> > On Mon, May 20, 2019 at 7:56 AM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Sun, May 12, 2019 at 04:27:45AM -0400, Justin Piszcz wrote:
> > > > Hello,
> > > >
> > > > I've turned off zram/zswap and I am still seeing the following during
> > > > periods of heavy I/O, I am returning to 5.0.xx in the meantime.
> > > >
> > > > Kernel: 5.1.1
> > > > Arch: x86_64
> > > > Dist: Debian x86_64
> > > >
> > > > [29967.019411] BUG: unable to handle kernel paging request at ffffea0002030000
> > > > [29967.019414] #PF error: [normal kernel read fault]
> > > > [29967.019415] PGD 103ffee067 P4D 103ffee067 PUD 103ffed067 PMD 0
> > > > [29967.019417] Oops: 0000 [#1] SMP PTI
> > > > [29967.019419] CPU: 10 PID: 77 Comm: khugepaged Tainted: G
> > > > T 5.1.1 #4
> > > > [29967.019420] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2 01/16/2015
> > > > [29967.019424] RIP: 0010:isolate_freepages_block+0xb9/0x310
> > > > [29967.019425] Code: 24 28 48 c1 e0 06 40 f6 c5 1f 48 89 44 24 20 49
> > > > 8d 45 79 48 89 44 24 18 44 89 f0 4d 89 ee 45 89 fd 41 89 c7 0f 84 ef
> > > > 00 00 00 <48> 8b 03 41 83 c4 01 a9 00 00 01 00 75 0c 48 8b 43 08 a8 01
> > > > 0f 84
> > >
> > > If you have debugging symbols installed, can you translate the faulting
> > > address with the following?
> > >
> > > ADDR=`nm /path/to/vmlinux-or-debuginfo-file | grep "t isolate_freepages_block\$" | awk '{print $1}'`
> > > addr2line -i -e vmlinux `printf "0x%lX" $((0x$ADDR+0xb9))`
> >
> > Another event this morning, this occurred when copying a single ~25GB
> > backup file from one block device device (3ware HW RAID) to a SW
> > RAID-1 (mdadm):
> >
> > With this event, it was a fault and khugepaged is not stuck at 100%
> > but this may be related as the stack trace is similar where
> > compaction_alloc is utilizing most of the CPU:
> > https://lkml.org/lkml/2019/5/9/225
> >
> > # ADDR=`nm /usr/src/linux/vmlinux | grep "t isolate_freepages_block\$"
> > | awk '{print $1}'`
> > # echo $ADDR
> > ffffffff812274f0
> > # addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0x83d))`
> > compaction.c:?
> > # addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0x8d0))`
> > compaction.c:?
> >
>
> Please use the offset 0xb9
>
> addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0xb9))
>
> --
> Mel Gorman
> SUSE Labs

Cc'ing myself since i observe such a behaviour sometimes right after KVM
VM is launched. No luck with reproducing it on demand so far, though.

--
Best regards,
Oleksandr Natalenko (post-factum)
Senior Software Maintenance Engineer