Linux 4.14-rc6: WARNING: CPU: 9 PID: 5377 at arch/x86/events/intel/core.c:2228 intel_pmu_handle_irq+0x4a8/0x4c0
From: Fengguang Wu
Date: Mon Oct 30 2017 - 02:27:49 EST
CC perf maintainers.
On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote:
Hi Linus,
Up to now we see the below boot error/warnings when testing v4.14-rc6.
They hit the RC release mainly due to various imperfections in 0day's
auto bisection. So I manually list them here and CC the likely easy to
debug ones to the corresponding maintainers in the followup emails.
boot_successes: 4700
boot_failures: 247
...
WARNING:at_arch/x86/events/intel/core.c:#intel_pmu_handle_irq: 1
This happens rarely, hence hard to bisect:
[ 189.480568] perf: interrupt took too long (5132 > 4982), lowering kernel.perf_event_max_sample_rate to 38000
[ 189.690660] perf: interrupt took too long (6582 > 6415), lowering kernel.perf_event_max_sample_rate to 30000
[ 189.901706] perf: interrupt took too long (8268 > 8227), lowering kernel.perf_event_max_sample_rate to 24000
[ 272.841032] perfevents: irq loop stuck!
[ 272.841038] ------------[ cut here ]------------
[ 272.841046] WARNING: CPU: 9 PID: 5377 at arch/x86/events/intel/core.c:2228 intel_pmu_handle_irq+0x4a8/0x4c0
[ 272.841047] Modules linked in: xfs loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver btrfs xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod sg intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 kvm_intel ttm kvm irqbypass crct10dif_pclmul drm_kms_helper crc32_pclmul syscopyarea crc32c_intel sysfillrect sysimgblt snd_pcm ghash_clmulni_intel fb_sys_fops ahci pcbc libahci snd_timer nvme aesni_intel snd crypto_simd ipmi_si glue_helper mxm_wmi soundcore drm nvme_core cryptd ipmi_devintf pcspkr libata shpchp ipmi_msghandler wmi acpi_power_meter acpi_pad ip_tables
[ 272.841083] CPU: 9 PID: 5377 Comm: usemem Not tainted 4.14.0-rc6 #1
[ 272.841084] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[ 272.841085] task: ffff881011adcd00 task.stack: ffffc90008fac000
[ 272.841087] RIP: 0010:intel_pmu_handle_irq+0x4a8/0x4c0
[ 272.841089] RSP: 0000:ffff88103f445c00 EFLAGS: 00010086
[ 272.841090] RAX: 000000000000001b RBX: 0000000000000064 RCX: 0000000000000000
[ 272.841091] RDX: ffff88103f456180 RSI: ffff88103f44e018 RDI: ffff88103f44e018
[ 272.841092] RBP: ffff88103f445df0 R08: 0000000000000000 R09: 000000000000001b
[ 272.841093] R10: ffff88103f445c00 R11: 00000000000da200 R12: ffff88103f44a3a0
[ 272.841094] R13: ffff88103a27e000 R14: 0000000000000040 R15: ffff88103f44a5a0
[ 272.841095] FS: 00007f8bed879700(0000) GS:ffff88103f440000(0000) knlGS:0000000000000000
[ 272.841096] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 272.841097] CR2: 00007401a7800000 CR3: 0000001015908006 CR4: 00000000003606e0
[ 272.841098] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 272.841099] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 272.841100] Call Trace:
[ 272.841101] <NMI>
[ 272.841111] perf_event_nmi_handler+0x2c/0x50
[ 272.841116] ? sched_clock+0x9/0x10
[ 272.841118] ? sched_clock+0x9/0x10
[ 272.841120] ? perf_event_nmi_handler+0x2c/0x50
[ 272.841130] nmi_handle+0x71/0x130
[ 272.841132] default_do_nmi+0x53/0x110
[ 272.841133] do_nmi+0xec/0x140
[ 272.841138] end_repeat_nmi+0x1a/0x1e
[ 272.841141] RIP: 0010:native_write_msr+0x6/0x30
[ 272.841142] RSP: 0000:ffff88103f443e60 EFLAGS: 00000046
[ 272.841143] RAX: 00000000000000b0 RBX: ffff88103a27e000 RCX: 000000000000038d
[ 272.841144] RDX: 0000000000000000 RSI: 00000000000000b0 RDI: 000000000000038d
[ 272.841145] RBP: ffff88103f443e80 R08: 0000000000000007 R09: 0000000000000000
[ 272.841146] R10: ffff88103f443df0 R11: 00007f8bed878c20 R12: 000000000000000b
[ 272.841147] R13: 0000000000000004 R14: 0000000000000000 R15: ffff88103f4548e8
[ 272.841150] ? native_write_msr+0x6/0x30
[ 272.841152] ? native_write_msr+0x6/0x30
[ 272.841152] </NMI>
[ 272.841153] <IRQ>
[ 272.841155] ? intel_pmu_enable_event+0x19c/0x1d0
[ 272.841157] x86_pmu_start+0x7a/0xa0
[ 272.841159] x86_pmu_enable+0x272/0x2e0
[ 272.841166] ? __perf_install_in_context+0x160/0x160
[ 272.841168] perf_pmu_enable+0x7/0x10
[ 272.841170] perf_mux_hrtimer_handler+0x1bc/0x1f0
[ 272.841175] __hrtimer_run_queues+0xdd/0x230
[ 272.841177] hrtimer_interrupt+0xa3/0x1f0
[ 272.841179] smp_apic_timer_interrupt+0x5f/0x140
[ 272.841181] apic_timer_interrupt+0x9d/0xb0
[ 272.841182] </IRQ>
[ 272.841184] RIP: 0033:0x5631f31497fd
[ 272.841185] RSP: 002b:00007f8bed878c38 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[ 272.841186] RAX: 0000000000000000 RBX: 000000081aead944 RCX: 000000003a329d08
[ 272.841187] RDX: 0000000000000001 RSI: 000000081aead944 RDI: 000073c0d038b000
[ 272.841188] RBP: 000000081aead944 R08: 00007f8bed878cdc R09: 0000000000000001
[ 272.841189] R10: 00007f8bed878c20 R11: 00007f8bed878c20 R12: 0000000000000001
[ 272.841190] R13: 00007f8bed878cdc R14: 000073c0d038b000 R15: 00000040d756ca20
[ 272.841192] Code: ff ff 48 89 c2 e8 b9 81 05 00 66 90 48 8b bd 40 fe ff ff 57 9d 0f 1f 44 00 00 e9 b2 fd ff ff 48 c7 c7 22 db c8 81 e8 89 44 0d 00 <0f> ff e8 f1 a7 ff ff c6 05 8a 16 2e 01 01 e9 50 fe ff ff 0f 1f
[ 272.841217] ---[ end trace 53c053df5268aee8 ]---
[ 272.841218]
Thanks,
Fengguang