Re: 5.1 and 5.1.1: BUG: unable to handle kernel paging request at ffffea0002030000
From: Justin Piszcz
Date: Tue May 21 2019 - 05:03:49 EST
On Mon, May 20, 2019 at 7:56 AM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sun, May 12, 2019 at 04:27:45AM -0400, Justin Piszcz wrote:
> > Hello,
> >
> > I've turned off zram/zswap and I am still seeing the following during
> > periods of heavy I/O, I am returning to 5.0.xx in the meantime.
> >
> > Kernel: 5.1.1
> > Arch: x86_64
> > Dist: Debian x86_64
> >
> > [29967.019411] BUG: unable to handle kernel paging request at ffffea0002030000
> > [29967.019414] #PF error: [normal kernel read fault]
> > [29967.019415] PGD 103ffee067 P4D 103ffee067 PUD 103ffed067 PMD 0
> > [29967.019417] Oops: 0000 [#1] SMP PTI
> > [29967.019419] CPU: 10 PID: 77 Comm: khugepaged Tainted: G
> > T 5.1.1 #4
> > [29967.019420] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2 01/16/2015
> > [29967.019424] RIP: 0010:isolate_freepages_block+0xb9/0x310
> > [29967.019425] Code: 24 28 48 c1 e0 06 40 f6 c5 1f 48 89 44 24 20 49
> > 8d 45 79 48 89 44 24 18 44 89 f0 4d 89 ee 45 89 fd 41 89 c7 0f 84 ef
> > 00 00 00 <48> 8b 03 41 83 c4 01 a9 00 00 01 00 75 0c 48 8b 43 08 a8 01
> > 0f 84
>
> If you have debugging symbols installed, can you translate the faulting
> address with the following?
>
> ADDR=`nm /path/to/vmlinux-or-debuginfo-file | grep "t isolate_freepages_block\$" | awk '{print $1}'`
> addr2line -i -e vmlinux `printf "0x%lX" $((0x$ADDR+0xb9))`
Another event this morning, this occurred when copying a single ~25GB
backup file from one block device device (3ware HW RAID) to a SW
RAID-1 (mdadm):
With this event, it was a fault and khugepaged is not stuck at 100%
but this may be related as the stack trace is similar where
compaction_alloc is utilizing most of the CPU:
https://lkml.org/lkml/2019/5/9/225
# ADDR=`nm /usr/src/linux/vmlinux | grep "t isolate_freepages_block\$"
| awk '{print $1}'`
# echo $ADDR
ffffffff812274f0
# addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0x83d))`
compaction.c:?
# addr2line -i -e /usr/src/linux/vmlinux `printf "0x%lX" $((0x$ADDR+0x8d0))`
compaction.c:?
# grep DEBUG_INFO /usr/src/linux/.config-5.1.3-2
CONFIG_DEBUG_INFO=y
I can help test again in a 2-3 weeks if needed but for now I need to
return back to disabling transparent huge pages.
[43775.068702] BUG: unable to handle kernel paging request at ffffea0002250000
[43775.068706] #PF error: [normal kernel read fault]
[43775.068707] PGD 103ffee067 P4D 103ffee067 PUD 103ffed067 PMD 0
[43775.068709] Oops: 0000 [#1] SMP PTI
[43775.068711] CPU: 1 PID: 77 Comm: khugepaged Tainted: G
T 5.1.3 #2
[43775.068712] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2 01/16/2015
[43775.068717] RIP: 0010:isolate_freepages_block+0xb9/0x310
[43775.068718] Code: 24 28 48 c1 e0 06 40 f6 c5 1f 48 89 44 24 20 49
8d 45 79 48 89 44 24 18 44 89 f0 4d 89 ee 45 89 fd 41 89 c7 0f 84 ef
00 00 00 <48> 8b 03 41 83 c4 01 a9 00 00 01 00 75 0c 48 8b 43 08 a8 01
0f 84
[43775.068719] RSP: 0018:ffffc900003a7860 EFLAGS: 00010246
[43775.068720] RAX: 0000000000000000 RBX: ffffea0002250000 RCX: ffffc900003a7b69
[43775.068721] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff828c4d90
[43775.068721] RBP: 0000000000089400 R08: 0000000000000001 R09: 0000000000000000
[43775.068722] R10: 0000000000000202 R11: ffffffff828c49d0 R12: 0000000000000000
[43775.068723] R13: 0000000000000000 R14: ffffc900003a7af0 R15: 0000000000000000
[43775.068724] FS: 0000000000000000(0000) GS:ffff88903f840000(0000)
knlGS:0000000000000000
[43775.068725] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43775.068725] CR2: ffffea0002250000 CR3: 000000000280e002 CR4: 00000000001606e0
[43775.068726] Call Trace:
[43775.068730] compaction_alloc+0x83d/0x8d0
[43775.068732] migrate_pages+0x30d/0x750
[43775.068734] ? isolate_migratepages_block+0xa10/0xa10
[43775.068735] ? __reset_isolation_suitable+0x110/0x110
[43775.068736] compact_zone+0x684/0xa70
[43775.068738] compact_zone_order+0x109/0x150
[43775.068741] ? schedule_timeout+0x1ba/0x290
[43775.068743] ? record_times+0x13/0xa0
[43775.068744] try_to_compact_pages+0x10d/0x220
[43775.068747] __alloc_pages_direct_compact+0x93/0x180
[43775.068748] __alloc_pages_nodemask+0x6c7/0xe20
[43775.068751] ? __wake_up_common_lock+0xb0/0xb0
[43775.068752] khugepaged+0x31f/0x19c0
[43775.068754] ? __wake_up_common_lock+0xb0/0xb0
[43775.068755] ? __wake_up_common_lock+0xb0/0xb0
[43775.068756] ? collapse_shmem.isra.4+0xc20/0xc20
[43775.068759] kthread+0x10a/0x120
[43775.068761] ? __kthread_create_on_node+0x1b0/0x1b0
[43775.068762] ret_from_fork+0x35/0x40
[43775.068763] CR2: ffffea0002250000
[43775.068764] ---[ end trace 1f63fc0e799750fe ]---
[43775.068766] RIP: 0010:isolate_freepages_block+0xb9/0x310
[43775.068767] Code: 24 28 48 c1 e0 06 40 f6 c5 1f 48 89 44 24 20 49
8d 45 79 48 89 44 24 18 44 89 f0 4d 89 ee 45 89 fd 41 89 c7 0f 84 ef
00 00 00 <48> 8b 03 41 83 c4 01 a9 00 00 01 00 75 0c 48 8b 43 08 a8 01
0f 84
[43775.068767] RSP: 0018:ffffc900003a7860 EFLAGS: 00010246
[43775.068768] RAX: 0000000000000000 RBX: ffffea0002250000 RCX: ffffc900003a7b69
[43775.068769] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff828c4d90
[43775.068770] RBP: 0000000000089400 R08: 0000000000000001 R09: 0000000000000000
[43775.068770] R10: 0000000000000202 R11: ffffffff828c49d0 R12: 0000000000000000
[43775.068771] R13: 0000000000000000 R14: ffffc900003a7af0 R15: 0000000000000000
[43775.068772] FS: 0000000000000000(0000) GS:ffff88903f840000(0000)
knlGS:0000000000000000
[43775.068773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43775.068773] CR2: ffffea0002250000 CR3: 000000000280e002 CR4: 00000000001606e0