WARNING and PANIC in irq_matrix_free

From: Tariq Toukan
Date: Tue Feb 20 2018 - 07:07:50 EST


Hi Thomas,

We started seeing new issues in our net-device daily regression tests.
They are related to patch [1] introduced in kernel 4.15-rc1.

We frequently see a warning in dmesg [2]. Repro is not consistent, we tried to narrow it down to a smaller run but couldn't.

In addition, sometimes (less frequent) the warning is followed by a panic [3].

I can share all needed details to help analyze this bug.
If you suspect specific flows, we can do an educated narrow down.

Regards,
Tariq


[1] 2f75d9e1c905 genirq: Implement bitmap matrix allocator

[2]
[ 8664.868564] WARNING: CPU: 5 PID: 0 at kernel/irq/matrix.c:370 irq_matrix_free+0x30/0xd0
[ 8664.891905] Modules linked in: bonding rdma_ucm ib_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx5_ib mlx5_core mlxfw mlx4_ib ib_core mlx4_en mlx4_core devlink macvlan vxlan ip6_udp_tunnel udp_tunnel 8021q garp mrp stp llc mst_pciconf(OE) nfsv3 nfs fscache netconsole dm_mirror dm_region_hash dm_log dm_mod dax kvm_intel kvm irqbypass pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ata_generic cirrus drm_kms_helper syscopyarea sysfillrect pata_acpi sysimgblt fb_sys_fops ttm drm e1000 serio_raw virtio_console i2c_core floppy ata_piix [last unloaded: mst_pci]
[ 8664.905117] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G OE 4.15.0-for-upstream-perf-2018-02-08_07-00-42-18 #1
[ 8664.907613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[ 8664.910144] RIP: 0010:irq_matrix_free+0x30/0xd0
[ 8664.912624] RSP: 0018:ffff88023fd43f70 EFLAGS: 00010002
[ 8664.915149] RAX: 0000000000026318 RBX: ffff880157a77ec0 RCX: 0000000000000000
[ 8664.917679] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff880237038400
[ 8664.920244] RBP: ffff880237038400 R08: 00000000e8ba3c69 R09: 0000000000000000
[ 8664.922813] R10: 00000000000003ff R11: 0000000000000ad9 R12: ffff88023fc40000
[ 8664.925345] R13: 0000000000000000 R14: 0000000000000001 R15: 000000000000002b
[ 8664.927872] FS: 0000000000000000(0000) GS:ffff88023fd40000(0000) knlGS:0000000000000000
[ 8664.930455] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8664.932996] CR2: 0000000000f2c030 CR3: 000000000220a000 CR4: 00000000000006e0
[ 8664.935557] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8664.938051] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8664.940541] Call Trace:
[ 8664.942980] <IRQ>
[ 8664.945399] free_moved_vector+0x4e/0x100
[ 8664.947787] smp_irq_move_cleanup_interrupt+0x89/0x9e
[ 8664.950134] irq_move_cleanup_interrupt+0x95/0xa0
[ 8664.952480] </IRQ>
[ 8664.954800] RIP: 0010:native_safe_halt+0x2/0x10
[ 8664.957052] RSP: 0018:ffffc90000ccfee0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdf
[ 8664.959186] RAX: ffffffff818ab6e0 RBX: ffff880236233f00 RCX: 0000000000000000
[ 8664.960499] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 8664.961774] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000
[ 8664.963048] R10: 00000000000003ff R11: 0000000000000ad9 R12: ffff880236233f00
[ 8664.964345] R13: ffff880236233f00 R14: 0000000000000000 R15: 0000000000000000
[ 8664.965579] ? __cpuidle_text_start+0x8/0x8
[ 8664.966808] default_idle+0x18/0xf0
[ 8664.968040] do_idle+0x150/0x1d0
[ 8664.969249] cpu_startup_entry+0x19/0x20
[ 8664.970477] start_secondary+0x133/0x170
[ 8664.971700] secondary_startup_64+0xa5/0xb0
[ 8664.972909] Code: 41 56 41 89 f6 41 55 41 89 d5 89 f2 41 54 4c 8b 24 d5 60 24 18 82 55 48 89 fd 53 48 8b 47 28 44 39 6f 04 77 06 44 3b 6f 08 72 0b <0f> ff 5b 5d 41 5c 41 5d 41 5e c3 49 01 c4 41 80 7c 24 0c 00 74
[ 8664.975420] ---[ end trace 8be4ba51cd83f4bd ]---


[3]
[ 8943.038767] BUG: unable to handle kernel paging request at 000000037a6b561b
[ 8943.040114] IP: free_moved_vector+0x61/0x100
[ 8943.041531] PGD 0 P4D 0
[ 8943.042855] Oops: 0002 [#1] SMP PTI
[ 8943.044128] Modules linked in: bonding rdma_ucm ib_ucm rdma_cm iw_cm ib_ipoib ib_cm ib_uverbs ib_umad mlx5_ib mlx5_core mlxfw mlx4_ib ib_core mlx4_en mlx4_core devlink iptable_filter fuse btrfs xor zstd_decompress zstd_compress xxhash raid6_pq vfat msdos fat binfmt_misc bridge macvlan vxlan ip6_udp_tunnel udp_tunnel 8021q garp mrp stp llc mst_pciconf(OE) nfsv3 nfs fscache netconsole dm_mirror dm_region_hash dm_log dm_mod dax kvm_intel kvm irqbypass pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ata_generic cirrus drm_kms_helper syscopyarea sysfillrect pata_acpi sysimgblt fb_sys_fops ttm drm e1000 serio_raw virtio_console i2c_core floppy ata_piix [last unloaded: mst_pci]
[ 8943.052038] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W OE 4.15.0-for-upstream-perf-2018-02-08_07-00-42-18 #1
[ 8943.053350] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[ 8943.054654] RIP: 0010:free_moved_vector+0x61/0x100
[ 8943.055940] RSP: 0018:ffff88023fd43fa0 EFLAGS: 00010007
[ 8943.057233] RAX: 000000037a6b561b RBX: ffff880157a77ec0 RCX: 0000000000000001
[ 8943.058506] RDX: 00000000000155a8 RSI: 00000000000155a8 RDI: ffff880237038400
[ 8943.059784] RBP: ffff880157a77ec0 R08: 00000000e8ba3c69 R09: 0000000000000000
[ 8943.061051] R10: 0000000000000000 R11: 0000000000000000 R12: 000000007f0c0001
[ 8943.062462] R13: 00000000000155a8 R14: 0000000000000001 R15: 0000000000cc620d
[ 8943.063726] FS: 0000000000000000(0000) GS:ffff88023fd40000(0000) knlGS:0000000000000000
[ 8943.064993] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8943.066253] CR2: 000000037a6b561b CR3: 000000010badc000 CR4: 00000000000006e0
[ 8943.067522] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8943.068771] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8943.070029] Call Trace:
[ 8943.071273] <IRQ>
[ 8943.072503] smp_irq_move_cleanup_interrupt+0x89/0x9e
[ 8943.073794] irq_move_cleanup_interrupt+0x95/0xa0
[ 8943.075048] </IRQ>
[ 8943.076288] RIP: 0010:native_safe_halt+0x2/0x10
[ 8943.077530] RSP: 0018:ffffc90000ccfee0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdf
[ 8943.078795] RAX: ffffffff818ab6e0 RBX: ffff880236233f00 RCX: 0000000000000000
[ 8943.080077] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 8943.081435] RBP: 0000000000000005 R08: 00000000e8ba3c69 R09: 0000000000000000
[ 8943.082683] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880236233f00
[ 8943.083932] R13: ffff880236233f00 R14: 0000000000000000 R15: 0000000000000000
[ 8943.085185] ? __cpuidle_text_start+0x8/0x8
[ 8943.086438] default_idle+0x18/0xf0
[ 8943.087694] do_idle+0x150/0x1d0
[ 8943.088921] cpu_startup_entry+0x19/0x20
[ 8943.090163] start_secondary+0x133/0x170
[ 8943.091402] secondary_startup_64+0xa5/0xb0
[ 8943.092659] Code: 44 00 00 48 8b 3d c8 f7 9f 01 44 89 f1 44 89 e2 44 89 ee e8 e2 05 0b 00 48 c7 c0 20 50 01 00 4a 8d 04 e0 4a 03 04 ed 60 24 18 82 <48> c7 00 00 00 00 00 48 8b 45 28 48 85 c0 74 20 48 8b 55 20 48
[ 8943.095371] RIP: free_moved_vector+0x61/0x100 RSP: ffff88023fd43fa0
[ 8943.096685] CR2: 000000037a6b561b
[ 8943.098120] ---[ end trace 8be4ba51cd83f4c0 ]---
[ 8943.099387] Kernel panic - not syncing: Fatal exception in interrupt
[ 8943.101170] Kernel Offset: disabled
[ 8943.102410] ---[ end Kernel panic - not syncing: Fatal exception in interrupt