Re: cpu_needs_another_gp: unable to handle kernel paging request

From: Alex Lyakas
Date: Thu Sep 07 2017 - 03:48:09 EST


Hello Paul,

Thank you for your response.

Can you give us hint what does this panic indicate? A random kernel memory corruption? An improper use of an RCU primitive? A hardware issue?

This happened only once in one of the production systems, and we don't have a reproduction scenario unfortunately.

Thanks,
Alex.


-----Original Message----- From: Paul E. McKenney
Sent: Wednesday, September 06, 2017 6:02 PM
To: Alex Lyakas
Cc: josh@xxxxxxxxxxxxxxxx ; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: cpu_needs_another_gp: unable to handle kernel paging request

On Wed, Sep 06, 2017 at 12:53:42PM +0300, Alex Lyakas wrote:
Hello,

Kernel 3.18.19 hit the following panic[1]. Can you please advise on
how to debug this further, or if there is any known issue that you
recognize.

Thanks,
Alex.


[1]
Sep 5 01:05:02.092499 vsa-0000000f-vc-0 kernel: [1294776.890064]
BUG: unable to handle kernel paging request at fffffffffffffeda
Sep 5 01:05:02.092517 vsa-0000000f-vc-0 kernel: [1294776.890892]
IP: [<ffffffff810d12e5>] cpu_needs_another_gp+0x25/0x80
Sep 5 01:05:02.092517 vsa-0000000f-vc-0 kernel: [1294776.891007]
PGD 1c19067 PUD 1c1b067 PMD 0
Sep 5 01:05:02.092518 vsa-0000000f-vc-0 kernel: [1294776.891007]
Oops: 0002 [#1] PREEMPT SMP
Sep 5 01:05:02.092520 vsa-0000000f-vc-0 kernel: [1294776.891007]
Modules linked in: xt_nat(E) veth(E) xt_addrtype(E) br_netfilter(E)
xfrm_user(E) xfrm4_tunnel(E) tunnel4(E) ipcomp(E) xfrm_ipcomp(E)
esp4(E) ah4(E) 8021q(E) garp(E) mrp(E) xt_multiport(E) sd_mod(E)
bonding(E) ib_iser(OE) iscsi_tcp(OE) libiscsi_tcp(OE) libiscsi(OE)
scsi_transport_iscsi(OE) dm_zcache(OE) xfs(OE) btrfs(OE) raid456(OE)
async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E)
async_tx(E) raid6_pq(E) raid1(OE) md_mod(OE) rdma_ucm(OE)
ib_uverbs(OE) mlx4_ib(OE) mlx4_en(OE) ipt_MASQUERADE(E)
nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E)
nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E)
nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_CHECKSUM(E)
iptable_mangle(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) vxlan(E)
ip6_udp_tunnel(E) udp_tunnel(E) ptp(E) pps_core(E)
ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E)
x_tables(E) mlx4_core(OE) deflate(E) ctr(E) twofish_generic(E)
twofish_avx_x86_64(E) twofish_x86_64_3way(E) twofish_x86_64(E)
twofish_common(E) camellia_generic(E) camellia_aesni_avx2(E)
camellia_aesni_avx_x86_64(E) camellia_x86_64(E) serpent_avx2(E)
serpent_avx_x86_64(E) serpent_sse2_x86_64(E) xts(E)
serpent_generic(E) blowfish_generic(E) blowfish_x86_64(E)
blowfish_common(E) cast5_avx_x86_64(E) cast5_generic(E)
cast_common(E) des3_ede_x86_64(E) des_generic(E) cmac(E) xcbc(E)
rmd160(E) isert_scst(OE) crypto_null(E) rdma_cm(OE) af_key(E)
iw_cm(OE) xfrm_algo(E) ib_cm(OE) ib_sa(OE) ib_mad(OE) ib_core(OE)
ib_addr(OE) compat(OE) iscsi_scst(OE) scst_utgt(OE) scst_vdisk(OE)
libcrc32c(E) scst(OE) nls_iso8859_1(E) kvm_intel(E) kvm(E)
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
aesni_intel(E) nfsd(OE) aes_x86_64(E) lrw(E) gf128mul(E)
glue_helper(E) ablk_helper(E) cryptd(E) auth_rpcgss(E) nfs_acl(E)
mac_hid(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E)
dm_multipath(OE) scsi_dh(E) ttm(E) drm_kms_helper(E) serio_raw(E)
drm(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) i2c_piix4(E)
i6300esb(E) lp(E) parport(E) dm_iostat(OE) ata_generic(E)
pata_acpi(E) ata_piix(E) libata(E) psmouse(E) scsi_mod(OE)
Sep 5 01:05:02.092522 vsa-0000000f-vc-0 kernel: [1294776.892666]
CPU: 5 PID: 14385 Comm: aws Tainted: G W OE
3.18.19-zadara05 #1
Sep 5 01:05:02.092523 vsa-0000000f-vc-0 kernel: [1294776.892666]
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Sep 5 01:05:02.092524 vsa-0000000f-vc-0 kernel: [1294776.892666]
task: ffff880022da6540 ti: ffff88000a9a4000 task.ti:
ffff88000a9a4000
Sep 5 01:05:02.092525 vsa-0000000f-vc-0 kernel: [1294776.892666]
RIP: 0010:[<ffffffff810d12e5>] [<ffffffff810d12e5>]
cpu_needs_another_gp+0x25/0x80
Sep 5 01:05:02.092525 vsa-0000000f-vc-0 kernel: [1294776.892666]
RSP: 0000:ffff8808bfca3e88 EFLAGS: 00010097
Sep 5 01:05:02.092526 vsa-0000000f-vc-0 kernel: [1294776.892666]
RAX: 0000000000000000 RBX: ffffffff81c55c40 RCX: fffffffffffffeda
Sep 5 01:05:02.092526 vsa-0000000f-vc-0 kernel: [1294776.892666]
RDX: fffffffffffffeda RSI: ffff8808bfcad600 RDI: ffffffff81c55c40
Sep 5 01:05:02.092527 vsa-0000000f-vc-0 kernel: [1294776.892666]
RBP: ffff8808bfca3e88 R08: 00000000000021ac R09: 0000000000000100
Sep 5 01:05:02.092527 vsa-0000000f-vc-0 kernel: [1294776.892666]
R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000246
Sep 5 01:05:02.092529 vsa-0000000f-vc-0 kernel: [1294776.892666]
R13: 0000000000000009 R14: 0000000000000100 R15: ffff8808bfcad600
Sep 5 01:05:02.092531 vsa-0000000f-vc-0 kernel: [1294776.892666]
FS: 00007f158f7fe700(0000) GS:ffff8808bfca0000(0000)
knlGS:0000000000000000
Sep 5 01:05:02.092531 vsa-0000000f-vc-0 kernel: [1294776.892666]
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 5 01:05:02.092533 vsa-0000000f-vc-0 kernel: [1294776.892666]
CR2: fffffffffffffeda CR3: 0000000741e12000 CR4: 00000000003407e0
Sep 5 01:05:02.092554 vsa-0000000f-vc-0 kernel: [1294776.892666]
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 5 01:05:02.092566 vsa-0000000f-vc-0 kernel: [1294776.892666]
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 5 01:05:02.092568 vsa-0000000f-vc-0 kernel: [1294776.892666] Stack:
Sep 5 01:05:02.092569 vsa-0000000f-vc-0 kernel: [1294776.892666]
ffff8808bfca3ef8 ffffffff810d491c ffff88088e17d838 ffff88088e17d438
Sep 5 01:05:02.092571 vsa-0000000f-vc-0 kernel: [1294776.892666]
ffff880022da6540 ffff88000a9a7fd8 ffff8808bfca3eb8 ffff880799cad868
Sep 5 01:05:02.092572 vsa-0000000f-vc-0 kernel: [1294776.892666]
0000000000000004 0000000000000009 ffffffff81c0f0c8 0000000000000009
Sep 5 01:05:02.092572 vsa-0000000f-vc-0 kernel: [1294776.892666]
Call Trace:
Sep 5 01:05:02.092573 vsa-0000000f-vc-0 kernel: [1294776.892666] <IRQ>
Sep 5 01:05:02.092574 vsa-0000000f-vc-0 kernel: [1294776.892666]
[<ffffffff810d491c>] rcu_process_callbacks+0xcc/0x610
Sep 5 01:05:02.092576 vsa-0000000f-vc-0 kernel: [1294776.892666]
[<ffffffff81077025>] __do_softirq+0xf5/0x320
Sep 5 01:05:02.092578 vsa-0000000f-vc-0 kernel: [1294776.892666]
[<ffffffff81077575>] irq_exit+0x115/0x120
Sep 5 01:05:02.092579 vsa-0000000f-vc-0 kernel: [1294776.892666]
[<ffffffff8171a89a>] smp_apic_timer_interrupt+0x4a/0x60
Sep 5 01:05:02.092579 vsa-0000000f-vc-0 kernel: [1294776.892666]
[<ffffffff8171896d>] apic_timer_interrupt+0x6d/0x80
Sep 5 01:05:02.092580 vsa-0000000f-vc-0 kernel: [1294776.892666] <EOI>
Sep 5 01:05:02.092581 vsa-0000000f-vc-0 kernel: [1294776.892666]
[<ffffffff817179cd>] ? system_call_fastpath+0x16/0x1b
Sep 5 01:05:02.092582 vsa-0000000f-vc-0 kernel: [1294776.892666]
Code: 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 50 11 00 00 31 c0
48 8b 97 48 11 00 00 48 89 e5 48 39 d1 74 02 5d c3 48 8b 47 10 83
<c0> 01 83 e0 01 48 83 c0 20 8b 44 87 20 85 c0 75 11 48 83 7e 48
Sep 5 01:05:02.092585 vsa-0000000f-vc-0 kernel: [1294776.892666]
RIP [<ffffffff810d12e5>] cpu_needs_another_gp+0x25/0x80
Sep 5 01:05:02.092586 vsa-0000000f-vc-0 kernel: [1294776.892666]
RSP <ffff8808bfca3e88>
Sep 5 01:05:02.092587 vsa-0000000f-vc-0 kernel: [1294776.892666]
CR2: fffffffffffffeda
Sep 5 01:05:02.092588 vsa-0000000f-vc-0 kernel: [1294776.892666]
---[ end trace 9b3c5d4642bb89b5 ]---

New one on me! If this is reproducible, and if you have some other
version where it is not happening, do a bisection. If you have a set
of patches that you carry on top of the stable kernel (for example, to
support some new hardware), try reproducing on hardware that is supported
natively by 3.18.19. Either way, CONFIG_DEBUG_OBJECTS_RCU_HEAD can be
helpful, as can any number of other debugging Kconfig options.

Thanx, Paul