Re: cpu_needs_another_gp: unable to handle kernel paging request

From: Paul E. McKenney
Date: Wed Sep 06 2017 - 11:02:14 EST


On Wed, Sep 06, 2017 at 12:53:42PM +0300, Alex Lyakas wrote:
> Hello,
>
> Kernel 3.18.19 hit the following panic[1]. Can you please advise on
> how to debug this further, or if there is any known issue that you
> recognize.
>
> Thanks,
> Alex.
>
>
> [1]
> Sep 5 01:05:02.092499 vsa-0000000f-vc-0 kernel: [1294776.890064]
> BUG: unable to handle kernel paging request at fffffffffffffeda
> Sep 5 01:05:02.092517 vsa-0000000f-vc-0 kernel: [1294776.890892]
> IP: [<ffffffff810d12e5>] cpu_needs_another_gp+0x25/0x80
> Sep 5 01:05:02.092517 vsa-0000000f-vc-0 kernel: [1294776.891007]
> PGD 1c19067 PUD 1c1b067 PMD 0
> Sep 5 01:05:02.092518 vsa-0000000f-vc-0 kernel: [1294776.891007]
> Oops: 0002 [#1] PREEMPT SMP
> Sep 5 01:05:02.092520 vsa-0000000f-vc-0 kernel: [1294776.891007]
> Modules linked in: xt_nat(E) veth(E) xt_addrtype(E) br_netfilter(E)
> xfrm_user(E) xfrm4_tunnel(E) tunnel4(E) ipcomp(E) xfrm_ipcomp(E)
> esp4(E) ah4(E) 8021q(E) garp(E) mrp(E) xt_multiport(E) sd_mod(E)
> bonding(E) ib_iser(OE) iscsi_tcp(OE) libiscsi_tcp(OE) libiscsi(OE)
> scsi_transport_iscsi(OE) dm_zcache(OE) xfs(OE) btrfs(OE) raid456(OE)
> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E)
> async_tx(E) raid6_pq(E) raid1(OE) md_mod(OE) rdma_ucm(OE)
> ib_uverbs(OE) mlx4_ib(OE) mlx4_en(OE) ipt_MASQUERADE(E)
> nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E)
> nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E)
> nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_CHECKSUM(E)
> iptable_mangle(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) vxlan(E)
> ip6_udp_tunnel(E) udp_tunnel(E) ptp(E) pps_core(E)
> ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E)
> x_tables(E) mlx4_core(OE) deflate(E) ctr(E) twofish_generic(E)
> twofish_avx_x86_64(E) twofish_x86_64_3way(E) twofish_x86_64(E)
> twofish_common(E) camellia_generic(E) camellia_aesni_avx2(E)
> camellia_aesni_avx_x86_64(E) camellia_x86_64(E) serpent_avx2(E)
> serpent_avx_x86_64(E) serpent_sse2_x86_64(E) xts(E)
> serpent_generic(E) blowfish_generic(E) blowfish_x86_64(E)
> blowfish_common(E) cast5_avx_x86_64(E) cast5_generic(E)
> cast_common(E) des3_ede_x86_64(E) des_generic(E) cmac(E) xcbc(E)
> rmd160(E) isert_scst(OE) crypto_null(E) rdma_cm(OE) af_key(E)
> iw_cm(OE) xfrm_algo(E) ib_cm(OE) ib_sa(OE) ib_mad(OE) ib_core(OE)
> ib_addr(OE) compat(OE) iscsi_scst(OE) scst_utgt(OE) scst_vdisk(OE)
> libcrc32c(E) scst(OE) nls_iso8859_1(E) kvm_intel(E) kvm(E)
> crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
> aesni_intel(E) nfsd(OE) aes_x86_64(E) lrw(E) gf128mul(E)
> glue_helper(E) ablk_helper(E) cryptd(E) auth_rpcgss(E) nfs_acl(E)
> mac_hid(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E)
> dm_multipath(OE) scsi_dh(E) ttm(E) drm_kms_helper(E) serio_raw(E)
> drm(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) i2c_piix4(E)
> i6300esb(E) lp(E) parport(E) dm_iostat(OE) ata_generic(E)
> pata_acpi(E) ata_piix(E) libata(E) psmouse(E) scsi_mod(OE)
> Sep 5 01:05:02.092522 vsa-0000000f-vc-0 kernel: [1294776.892666]
> CPU: 5 PID: 14385 Comm: aws Tainted: G W OE
> 3.18.19-zadara05 #1
> Sep 5 01:05:02.092523 vsa-0000000f-vc-0 kernel: [1294776.892666]
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Sep 5 01:05:02.092524 vsa-0000000f-vc-0 kernel: [1294776.892666]
> task: ffff880022da6540 ti: ffff88000a9a4000 task.ti:
> ffff88000a9a4000
> Sep 5 01:05:02.092525 vsa-0000000f-vc-0 kernel: [1294776.892666]
> RIP: 0010:[<ffffffff810d12e5>] [<ffffffff810d12e5>]
> cpu_needs_another_gp+0x25/0x80
> Sep 5 01:05:02.092525 vsa-0000000f-vc-0 kernel: [1294776.892666]
> RSP: 0000:ffff8808bfca3e88 EFLAGS: 00010097
> Sep 5 01:05:02.092526 vsa-0000000f-vc-0 kernel: [1294776.892666]
> RAX: 0000000000000000 RBX: ffffffff81c55c40 RCX: fffffffffffffeda
> Sep 5 01:05:02.092526 vsa-0000000f-vc-0 kernel: [1294776.892666]
> RDX: fffffffffffffeda RSI: ffff8808bfcad600 RDI: ffffffff81c55c40
> Sep 5 01:05:02.092527 vsa-0000000f-vc-0 kernel: [1294776.892666]
> RBP: ffff8808bfca3e88 R08: 00000000000021ac R09: 0000000000000100
> Sep 5 01:05:02.092527 vsa-0000000f-vc-0 kernel: [1294776.892666]
> R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000246
> Sep 5 01:05:02.092529 vsa-0000000f-vc-0 kernel: [1294776.892666]
> R13: 0000000000000009 R14: 0000000000000100 R15: ffff8808bfcad600
> Sep 5 01:05:02.092531 vsa-0000000f-vc-0 kernel: [1294776.892666]
> FS: 00007f158f7fe700(0000) GS:ffff8808bfca0000(0000)
> knlGS:0000000000000000
> Sep 5 01:05:02.092531 vsa-0000000f-vc-0 kernel: [1294776.892666]
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep 5 01:05:02.092533 vsa-0000000f-vc-0 kernel: [1294776.892666]
> CR2: fffffffffffffeda CR3: 0000000741e12000 CR4: 00000000003407e0
> Sep 5 01:05:02.092554 vsa-0000000f-vc-0 kernel: [1294776.892666]
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 5 01:05:02.092566 vsa-0000000f-vc-0 kernel: [1294776.892666]
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep 5 01:05:02.092568 vsa-0000000f-vc-0 kernel: [1294776.892666] Stack:
> Sep 5 01:05:02.092569 vsa-0000000f-vc-0 kernel: [1294776.892666]
> ffff8808bfca3ef8 ffffffff810d491c ffff88088e17d838 ffff88088e17d438
> Sep 5 01:05:02.092571 vsa-0000000f-vc-0 kernel: [1294776.892666]
> ffff880022da6540 ffff88000a9a7fd8 ffff8808bfca3eb8 ffff880799cad868
> Sep 5 01:05:02.092572 vsa-0000000f-vc-0 kernel: [1294776.892666]
> 0000000000000004 0000000000000009 ffffffff81c0f0c8 0000000000000009
> Sep 5 01:05:02.092572 vsa-0000000f-vc-0 kernel: [1294776.892666]
> Call Trace:
> Sep 5 01:05:02.092573 vsa-0000000f-vc-0 kernel: [1294776.892666] <IRQ>
> Sep 5 01:05:02.092574 vsa-0000000f-vc-0 kernel: [1294776.892666]
> [<ffffffff810d491c>] rcu_process_callbacks+0xcc/0x610
> Sep 5 01:05:02.092576 vsa-0000000f-vc-0 kernel: [1294776.892666]
> [<ffffffff81077025>] __do_softirq+0xf5/0x320
> Sep 5 01:05:02.092578 vsa-0000000f-vc-0 kernel: [1294776.892666]
> [<ffffffff81077575>] irq_exit+0x115/0x120
> Sep 5 01:05:02.092579 vsa-0000000f-vc-0 kernel: [1294776.892666]
> [<ffffffff8171a89a>] smp_apic_timer_interrupt+0x4a/0x60
> Sep 5 01:05:02.092579 vsa-0000000f-vc-0 kernel: [1294776.892666]
> [<ffffffff8171896d>] apic_timer_interrupt+0x6d/0x80
> Sep 5 01:05:02.092580 vsa-0000000f-vc-0 kernel: [1294776.892666] <EOI>
> Sep 5 01:05:02.092581 vsa-0000000f-vc-0 kernel: [1294776.892666]
> [<ffffffff817179cd>] ? system_call_fastpath+0x16/0x1b
> Sep 5 01:05:02.092582 vsa-0000000f-vc-0 kernel: [1294776.892666]
> Code: 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 50 11 00 00 31 c0
> 48 8b 97 48 11 00 00 48 89 e5 48 39 d1 74 02 5d c3 48 8b 47 10 83
> <c0> 01 83 e0 01 48 83 c0 20 8b 44 87 20 85 c0 75 11 48 83 7e 48
> Sep 5 01:05:02.092585 vsa-0000000f-vc-0 kernel: [1294776.892666]
> RIP [<ffffffff810d12e5>] cpu_needs_another_gp+0x25/0x80
> Sep 5 01:05:02.092586 vsa-0000000f-vc-0 kernel: [1294776.892666]
> RSP <ffff8808bfca3e88>
> Sep 5 01:05:02.092587 vsa-0000000f-vc-0 kernel: [1294776.892666]
> CR2: fffffffffffffeda
> Sep 5 01:05:02.092588 vsa-0000000f-vc-0 kernel: [1294776.892666]
> ---[ end trace 9b3c5d4642bb89b5 ]---

New one on me! If this is reproducible, and if you have some other
version where it is not happening, do a bisection. If you have a set
of patches that you carry on top of the stable kernel (for example, to
support some new hardware), try reproducing on hardware that is supported
natively by 3.18.19. Either way, CONFIG_DEBUG_OBJECTS_RCU_HEAD can be
helpful, as can any number of other debugging Kconfig options.

Thanx, Paul