Re: cpu_needs_another_gp: unable to handle kernel paging request
From: Paul E. McKenney
Date: Thu Sep 07 2017 - 12:33:01 EST
On Thu, Sep 07, 2017 at 10:47:57AM +0300, Alex Lyakas wrote:
> Hello Paul,
>
> Thank you for your response.
>
> Can you give us hint what does this panic indicate? A random kernel
> memory corruption? An improper use of an RCU primitive? A hardware
> issue?
Could be any of those three. Running tests with debug Kconfig options
can help locate improper use of RCU primitives, in some cases with
much higher probability than any failure. So again, I strongly
encourage you to run tests as noted in my previous message.
> This happened only once in one of the production systems, and we
> don't have a reproduction scenario unfortunately.
That does make it harder to track down, and again pushes towards running
tests with debug Kconfig options enabled.
Thanx, Paul
> Thanks,
> Alex.
>
>
> -----Original Message----- From: Paul E. McKenney
> Sent: Wednesday, September 06, 2017 6:02 PM
> To: Alex Lyakas
> Cc: josh@xxxxxxxxxxxxxxxx ; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: cpu_needs_another_gp: unable to handle kernel paging request
>
> On Wed, Sep 06, 2017 at 12:53:42PM +0300, Alex Lyakas wrote:
> >Hello,
> >
> >Kernel 3.18.19 hit the following panic[1]. Can you please advise on
> >how to debug this further, or if there is any known issue that you
> >recognize.
> >
> >Thanks,
> >Alex.
> >
> >
> >[1]
> >Sep 5 01:05:02.092499 vsa-0000000f-vc-0 kernel: [1294776.890064]
> >BUG: unable to handle kernel paging request at fffffffffffffeda
> >Sep 5 01:05:02.092517 vsa-0000000f-vc-0 kernel: [1294776.890892]
> >IP: [<ffffffff810d12e5>] cpu_needs_another_gp+0x25/0x80
> >Sep 5 01:05:02.092517 vsa-0000000f-vc-0 kernel: [1294776.891007]
> >PGD 1c19067 PUD 1c1b067 PMD 0
> >Sep 5 01:05:02.092518 vsa-0000000f-vc-0 kernel: [1294776.891007]
> >Oops: 0002 [#1] PREEMPT SMP
> >Sep 5 01:05:02.092520 vsa-0000000f-vc-0 kernel: [1294776.891007]
> >Modules linked in: xt_nat(E) veth(E) xt_addrtype(E) br_netfilter(E)
> >xfrm_user(E) xfrm4_tunnel(E) tunnel4(E) ipcomp(E) xfrm_ipcomp(E)
> >esp4(E) ah4(E) 8021q(E) garp(E) mrp(E) xt_multiport(E) sd_mod(E)
> >bonding(E) ib_iser(OE) iscsi_tcp(OE) libiscsi_tcp(OE) libiscsi(OE)
> >scsi_transport_iscsi(OE) dm_zcache(OE) xfs(OE) btrfs(OE) raid456(OE)
> >async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E)
> >async_tx(E) raid6_pq(E) raid1(OE) md_mod(OE) rdma_ucm(OE)
> >ib_uverbs(OE) mlx4_ib(OE) mlx4_en(OE) ipt_MASQUERADE(E)
> >nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E)
> >nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E)
> >nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_CHECKSUM(E)
> >iptable_mangle(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) vxlan(E)
> >ip6_udp_tunnel(E) udp_tunnel(E) ptp(E) pps_core(E)
> >ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E)
> >x_tables(E) mlx4_core(OE) deflate(E) ctr(E) twofish_generic(E)
> >twofish_avx_x86_64(E) twofish_x86_64_3way(E) twofish_x86_64(E)
> >twofish_common(E) camellia_generic(E) camellia_aesni_avx2(E)
> >camellia_aesni_avx_x86_64(E) camellia_x86_64(E) serpent_avx2(E)
> >serpent_avx_x86_64(E) serpent_sse2_x86_64(E) xts(E)
> >serpent_generic(E) blowfish_generic(E) blowfish_x86_64(E)
> >blowfish_common(E) cast5_avx_x86_64(E) cast5_generic(E)
> >cast_common(E) des3_ede_x86_64(E) des_generic(E) cmac(E) xcbc(E)
> >rmd160(E) isert_scst(OE) crypto_null(E) rdma_cm(OE) af_key(E)
> >iw_cm(OE) xfrm_algo(E) ib_cm(OE) ib_sa(OE) ib_mad(OE) ib_core(OE)
> >ib_addr(OE) compat(OE) iscsi_scst(OE) scst_utgt(OE) scst_vdisk(OE)
> >libcrc32c(E) scst(OE) nls_iso8859_1(E) kvm_intel(E) kvm(E)
> >crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
> >aesni_intel(E) nfsd(OE) aes_x86_64(E) lrw(E) gf128mul(E)
> >glue_helper(E) ablk_helper(E) cryptd(E) auth_rpcgss(E) nfs_acl(E)
> >mac_hid(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E)
> >dm_multipath(OE) scsi_dh(E) ttm(E) drm_kms_helper(E) serio_raw(E)
> >drm(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) i2c_piix4(E)
> >i6300esb(E) lp(E) parport(E) dm_iostat(OE) ata_generic(E)
> >pata_acpi(E) ata_piix(E) libata(E) psmouse(E) scsi_mod(OE)
> >Sep 5 01:05:02.092522 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >CPU: 5 PID: 14385 Comm: aws Tainted: G W OE
> >3.18.19-zadara05 #1
> >Sep 5 01:05:02.092523 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >Sep 5 01:05:02.092524 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >task: ffff880022da6540 ti: ffff88000a9a4000 task.ti:
> >ffff88000a9a4000
> >Sep 5 01:05:02.092525 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >RIP: 0010:[<ffffffff810d12e5>] [<ffffffff810d12e5>]
> >cpu_needs_another_gp+0x25/0x80
> >Sep 5 01:05:02.092525 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >RSP: 0000:ffff8808bfca3e88 EFLAGS: 00010097
> >Sep 5 01:05:02.092526 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >RAX: 0000000000000000 RBX: ffffffff81c55c40 RCX: fffffffffffffeda
> >Sep 5 01:05:02.092526 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >RDX: fffffffffffffeda RSI: ffff8808bfcad600 RDI: ffffffff81c55c40
> >Sep 5 01:05:02.092527 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >RBP: ffff8808bfca3e88 R08: 00000000000021ac R09: 0000000000000100
> >Sep 5 01:05:02.092527 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000246
> >Sep 5 01:05:02.092529 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >R13: 0000000000000009 R14: 0000000000000100 R15: ffff8808bfcad600
> >Sep 5 01:05:02.092531 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >FS: 00007f158f7fe700(0000) GS:ffff8808bfca0000(0000)
> >knlGS:0000000000000000
> >Sep 5 01:05:02.092531 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >Sep 5 01:05:02.092533 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >CR2: fffffffffffffeda CR3: 0000000741e12000 CR4: 00000000003407e0
> >Sep 5 01:05:02.092554 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >Sep 5 01:05:02.092566 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >Sep 5 01:05:02.092568 vsa-0000000f-vc-0 kernel: [1294776.892666] Stack:
> >Sep 5 01:05:02.092569 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >ffff8808bfca3ef8 ffffffff810d491c ffff88088e17d838 ffff88088e17d438
> >Sep 5 01:05:02.092571 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >ffff880022da6540 ffff88000a9a7fd8 ffff8808bfca3eb8 ffff880799cad868
> >Sep 5 01:05:02.092572 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >0000000000000004 0000000000000009 ffffffff81c0f0c8 0000000000000009
> >Sep 5 01:05:02.092572 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >Call Trace:
> >Sep 5 01:05:02.092573 vsa-0000000f-vc-0 kernel: [1294776.892666] <IRQ>
> >Sep 5 01:05:02.092574 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >[<ffffffff810d491c>] rcu_process_callbacks+0xcc/0x610
> >Sep 5 01:05:02.092576 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >[<ffffffff81077025>] __do_softirq+0xf5/0x320
> >Sep 5 01:05:02.092578 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >[<ffffffff81077575>] irq_exit+0x115/0x120
> >Sep 5 01:05:02.092579 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >[<ffffffff8171a89a>] smp_apic_timer_interrupt+0x4a/0x60
> >Sep 5 01:05:02.092579 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >[<ffffffff8171896d>] apic_timer_interrupt+0x6d/0x80
> >Sep 5 01:05:02.092580 vsa-0000000f-vc-0 kernel: [1294776.892666] <EOI>
> >Sep 5 01:05:02.092581 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >[<ffffffff817179cd>] ? system_call_fastpath+0x16/0x1b
> >Sep 5 01:05:02.092582 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >Code: 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 50 11 00 00 31 c0
> >48 8b 97 48 11 00 00 48 89 e5 48 39 d1 74 02 5d c3 48 8b 47 10 83
> ><c0> 01 83 e0 01 48 83 c0 20 8b 44 87 20 85 c0 75 11 48 83 7e 48
> >Sep 5 01:05:02.092585 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >RIP [<ffffffff810d12e5>] cpu_needs_another_gp+0x25/0x80
> >Sep 5 01:05:02.092586 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >RSP <ffff8808bfca3e88>
> >Sep 5 01:05:02.092587 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >CR2: fffffffffffffeda
> >Sep 5 01:05:02.092588 vsa-0000000f-vc-0 kernel: [1294776.892666]
> >---[ end trace 9b3c5d4642bb89b5 ]---
>
> New one on me! If this is reproducible, and if you have some other
> version where it is not happening, do a bisection. If you have a set
> of patches that you carry on top of the stable kernel (for example, to
> support some new hardware), try reproducing on hardware that is supported
> natively by 3.18.19. Either way, CONFIG_DEBUG_OBJECTS_RCU_HEAD can be
> helpful, as can any number of other debugging Kconfig options.
>
> Thanx, Paul
>