Re: "irq/matrix: Spread interrupts on allocation" breaks nouveau in mainline kernel

From: Lyude Paul
Date: Tue Jan 23 2018 - 20:27:07 EST


JFYI: I confirmed this patch is definitely broken. I'm seeing nouveau get
assigned the same MSI vector as another device on the system, which would
explain why interrupts suddenly stop working. I'll keep looking into it further
tomorrow.

On Tue, 2018-01-23 at 17:01 -0500, Lyude Paul wrote:
> Hi! Sorry to be the bearer of bad news, but this patch actually seems to break
> suspending and resuming with nouveau on my machine:
>
> [ 29.694755] PM: suspend entry (deep)
> [ 29.694773] PM: Syncing filesystems ... done.
> [ 29.696203] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [ 29.697442] OOM killer disabled.
> [ 29.697448] Freezing remaining freezable tasks ... (elapsed 0.000 seconds)
> done.
> [ 29.698232] Suspending console(s) (use no_console_suspend to debug)
> [ 29.698993] serial 00:05: disabled
> [ 29.708227] sd 4:0:0:0: [sda] Synchronizing SCSI cache
> [ 29.708428] sd 4:0:0:0: [sda] Stopping disk
> [ 30.614581] ACPI: Preparing to enter system sleep state S3
> [ 30.917726] PM: Saving platform NVS memory
> [ 30.917736] Disabling non-boot CPUs ...
> [ 30.925616] smpboot: CPU 1 is now offline
> [ 30.936915] smpboot: CPU 2 is now offline
> [ 30.952824] smpboot: CPU 3 is now offline
> [ 30.964764] smpboot: CPU 4 is now offline
> [ 30.980663] smpboot: CPU 5 is now offline
> [ 30.992692] smpboot: CPU 6 is now offline
> [ 31.002572] smpboot: CPU 7 is now offline
> [ 31.003130] ACPI: Low-level resume complete
> [ 31.003180] PM: Restoring platform NVS memory
> [ 31.003578] WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291
> smp_call_function_single+0xdc/0xe0
> [ 31.003578] Modules linked in: nouveau video mxm_wmi i2c_algo_bit ttm
> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfat fat
> usbhid
> crc32_pclmul i2c_piix4 i2c_core shpchp k10temp wmi acpi_cpufreq crc32c_intel
> r8169 mii xhci_pci xhci_hcd w83627hf_wdt
> [ 31.003590] CPU: 0 PID: 11523 Comm: rtcwake Not tainted 4.15.0-rc8nouveau-
> clockgating+ #1
> [ 31.003591] Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS
> 1.60
> 09/19/2017
> [ 31.003592] RIP: 0010:smp_call_function_single+0xdc/0xe0
> [ 31.003593] RSP: 0018:ffffc900004a3c40 EFLAGS: 00010046
> [ 31.003594] RAX: 0000000000000000 RBX: ffffc900004a3cdc RCX:
> 0000000000000001
> [ 31.003594] RDX: ffffc900004a3c98 RSI: ffffffff8137a180 RDI:
> 0000000000000000
> [ 31.003595] RBP: ffffc900004a3c70 R08: 0000000000000001 R09:
> 0000000000010000
> [ 31.003595] R10: ffffc900004a3c98 R11: 0000000000000000 R12:
> 0000000000000000
> [ 31.003596] R13: 0000000001000000 R14: ffffc900004a3d0c R15:
> 0000000000000000
> [ 31.003597] FS: 00007f03bee93540(0000) GS:ffff88021ae00000(0000)
> knlGS:0000000000000000
> [ 31.003597] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 31.003598] CR2: 00007fffb6673008 CR3: 000000020ddd4000 CR4:
> 00000000003406f0
> [ 31.003598] Call Trace:
> [ 31.003603] ? rdmsr_safe_on_cpu+0x4b/0x70
> [ 31.003604] rdmsr_safe_on_cpu+0x4b/0x70
> [ 31.003606] get_block_address.isra.0+0x6e/0xe0
> [ 31.003607] mce_amd_feature_init+0x63/0x2c0
> [ 31.003609] mce_syscore_resume+0x1e/0x30
> [ 31.003611] syscore_resume+0x4b/0x170
> [ 31.003613] suspend_devices_and_enter+0x608/0x7e0
> [ 31.003614] pm_suspend+0x315/0x380
> [ 31.003615] state_store+0x7d/0xe0
> [ 31.003618] kernfs_fop_write+0xfa/0x180
> [ 31.003620] __vfs_write+0x23/0x130
> [ 31.003623] ? SYSC_newfstat+0x29/0x40
> [ 31.003625] ? _cond_resched+0x15/0x40
> [ 31.003626] vfs_write+0xad/0x1a0
> [ 31.003627] SyS_write+0x42/0x90
> [ 31.003629] entry_SYSCALL_64_fastpath+0x24/0x87
> [ 31.003630] RIP: 0033:0x7f03be9ae8f4
> [ 31.003631] RSP: 002b:00007ffe6bf825f8 EFLAGS: 00000246
> [ 31.003632] Code: fe ff ff 8b 55 e8 83 e2 01 74 0a f3 90 8b 55 e8 83 e2 01
> 75
> f6 48 83 c4 28 41 5a 5d 49 8d 62 f8 c3 8b 05 58 b6 48 01 85 c0 75 86 <0f> ff
> eb
> 82 0f 1f 44 00 00 f6 46 18 01 75 15 c7 46 18 01 00 00
> [ 31.003648] ---[ end trace 19fa2f7781ed5237 ]---
> [ 31.004025] Enabling non-boot CPUs ...
> [ 31.004052] x86: Booting SMP configuration:
> [ 31.004052] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [ 31.006368] cache: parent cpu1 should not be sleeping
> [ 31.006442] microcode: CPU1: patch_level=0x08001129
> [ 31.006509] CPU1 is up
> [ 31.006525] smpboot: Booting Node 0 Processor 2 APIC 0x2
> [ 31.008832] cache: parent cpu2 should not be sleeping
> [ 31.008894] microcode: CPU2: patch_level=0x08001129
> [ 31.008966] CPU2 is up
> [ 31.008975] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [ 31.011264] cache: parent cpu3 should not be sleeping
> [ 31.011329] microcode: CPU3: patch_level=0x08001129
> [ 31.011404] CPU3 is up
> [ 31.011413] smpboot: Booting Node 0 Processor 4 APIC 0x8
> [ 31.013833] cache: parent cpu4 should not be sleeping
> [ 31.013903] microcode: CPU4: patch_level=0x08001129
> [ 31.014025] CPU4 is up
> [ 31.014036] smpboot: Booting Node 0 Processor 5 APIC 0x9
> [ 31.016354] cache: parent cpu5 should not be sleeping
> [ 31.016421] microcode: CPU5: patch_level=0x08001129
> [ 31.016534] CPU5 is up
> [ 31.016544] smpboot: Booting Node 0 Processor 6 APIC 0xa
> [ 31.018857] cache: parent cpu6 should not be sleeping
> [ 31.018930] microcode: CPU6: patch_level=0x08001129
> [ 31.019047] CPU6 is up
> [ 31.019057] smpboot: Booting Node 0 Processor 7 APIC 0xb
> [ 31.021376] cache: parent cpu7 should not be sleeping
> [ 31.021444] microcode: CPU7: patch_level=0x08001129
> [ 31.021579] CPU7 is up
> [ 31.022166] ACPI: Waking up from system sleep state S3
> [ 31.070791] usb usb1: root hub lost power or was reset
> [ 31.070794] usb usb2: root hub lost power or was reset
> [ 31.071628] serial 00:05: activated
> [ 31.080265] sd 4:0:0:0: [sda] Starting disk
> [ 31.126099] hpet_rtc_timer_reinit: 68 callbacks suppressed
> [ 31.126099] hpet1: lost 2 rtc interrupts
> [ 31.160913] r8169 0000:1e:00.0 enp30s0: link down
> [ 31.255563] do_IRQ: 1.35 No irq handler for vector
> [ 31.379537] ata6: SATA link down (SStatus 0 SControl 300)
> [ 31.379558] ata1: SATA link down (SStatus 0 SControl 300)
> [ 31.380306] ata2: SATA link down (SStatus 0 SControl 300)
> [ 31.435705] ata9: SATA link down (SStatus 0 SControl 300)
> [ 31.589932] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [ 31.590320] ata5.00: configured for UDMA/133
> [ 31.610043] usb 1-4: reset low-speed USB device number 2 using xhci_hcd
> [ 32.226138] usb 1-5: reset low-speed USB device number 3 using xhci_hcd
> [ 33.257867] nouveau 0000:22:00.0: DRM: EVO timeout
> [ 34.237185] r8169 0000:1e:00.0 enp30s0: link up
> [ 35.257880] nouveau 0000:22:00.0: DRM: base-0: timeout
> [ 37.258334] nouveau 0000:22:00.0: DRM: base-0: timeout
> [ 37.276084] OOM killer enabled.
> [ 37.276612] Restarting tasks ... done.
> [ 37.277722] PM: suspend exit
>
> I haven't yet actually investigated why it does this, but a bisect of master
> led
> me to here.
>