iommu_intel or i915 regression in 4.18, 4.19.12 and drm-tip

From: Eric Wong
Date: Thu Dec 27 2018 - 06:50:23 EST


I just got a used Thinkpad X201 (Core i5 M 520, Intel QM57
chipset) and hit some kernel panics while trying to view
image/animation-intensive stuff in Firefox (X11) unless I use
"iommu_intel=igfx_off".

With Debian stable backport kernels, "linux-image-4.17.0-0.bpo.3-amd64"
(4.17.17-1~bpo9+1) has no problems. But "linux-image-4.18.0-0.bpo.3-amd64"
(4.18.20-2~bpo9+1) gives a blank screen before I can login via agetty
and run startx.

Building 4.19.12 myself got me into X11 and able to start
Firefox to panic the kernel. I also updated to the latest BIOS
(1.40), but it's an EOL laptop (but it's still the most powerful
laptop I use). I intend to replace the BIOS with Coreboot soon...

Initially, I thought I was hitting another GPU hang from 4.18+:

https://bugs.freedesktop.org/show_bug.cgi?id=107945

But building drm-tip @ commit 28bb1fc015cedadf3b099b8bd0bb27609849f362
("drm-tip: 2018y-12m-25d-08h-12m-37s UTC integration manifest")
I was still able to reproduce the panic unless I use iommu_intel=igfx_off
"i915.reset=1" did not help matters, either.

Below is what I got from netconsole while on drm-tip:

Kernel panic - not syncing: DMAR hardware is malfunctioning
Shutting down cpus with NMI
Kernel Offset: disabled
---[ end Kernel panic - not syncing: DMAR hardware is malfunctioning ]---
------------[ cut here ]------------
sched: Unexpected reschedule of offline CPU#3!
WARNING: CPU: 0 PID: 105 at native_smp_send_reschedule+0x34/0x40
Modules linked in: netconsole ccm snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul crc32c_intel ghash_clmulni_intel arc4 iwldvm aesni_intel aes_x86_64 crypto_simd cryptd mac80211 glue_helper intel_cstate iwlwifi intel_uncore i915 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper cfbfillrect syscopyarea intel_ips cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea thinkpad_acpi prime_numbers cfg80211 ledtrig_audio i2c_i801 sg snd_hda_intel led_class snd_hda_codec drm ac drm_panel_orientation_quirks snd_hwdep battery e1000e agpgart snd_hda_core snd_pcm snd_timer ptp snd soundcore pps_core ehci_pci ehci_hcd lpc_ich video mfd_core button acpi_cpufreq ecryptfs ip_tables x_tables ipv6 evdev thermal [last unloaded: netconsole]
CPU: 0 PID: 105 Comm: kworker/u8:3 Not tainted 4.20.0-rc7b1+ #1
Hardware name: LENOVO 3680FBU/3680FBU, BIOS 6QET70WW (1.40 ) 10/11/2012
Workqueue: i915 __i915_gem_free_work [i915]
RIP: 0010:native_smp_send_reschedule+0x34/0x40
Code: 05 69 c6 c9 00 73 15 48 8b 05 18 2d b3 00 be fd 00 00 00 48 8b 40 30 e9 9a 58 7d 00 89 fe 48 c7 c7 78 73 af 81 e8 dc c2 01 00 <0f> 0b c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 8b 05 0d 7d df
RSP: 0018:ffff888075003d98 EFLAGS: 00010092
RAX: 000000000000002e RBX: ffff8880751a0740 RCX: 0000000000000006
RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff888075015440
RBP: ffff88806e823700 R08: 0000000000000000 R09: ffff888072fc07c0
R10: ffff888075003d60 R11: 00000000fff5c002 R12: ffff8880751a0740
R13: ffff8880751a0740 R14: 0000000000000000 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff888075000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdb1f53f000 CR3: 0000000001c0a004 CR4: 00000000000206f0
Call Trace:
<IRQ>
? check_preempt_curr+0x4e/0x90
? ttwu_do_wakeup.isra.19+0x14/0xf0
? try_to_wake_up+0x323/0x410
? autoremove_wake_function+0xe/0x30
? __wake_up_common+0x8d/0x140
? __wake_up_common_lock+0x6c/0x90
? irq_work_run_list+0x49/0x80
? tick_sched_handle.isra.6+0x50/0x50
? update_process_times+0x3b/0x50
? tick_sched_handle.isra.6+0x30/0x50
? tick_sched_timer+0x3b/0x80
? __hrtimer_run_queues+0xea/0x270
? hrtimer_interrupt+0x101/0x240
? smp_apic_timer_interrupt+0x6a/0x150
? apic_timer_interrupt+0xf/0x20
</IRQ>
? panic+0x1ca/0x212
? panic+0x1c7/0x212
? __iommu_flush_iotlb+0x19e/0x1c0
? iommu_flush_iotlb_psi+0x96/0xf0
? intel_unmap+0xbf/0xf0
? i915_gem_object_put_pages_gtt+0x36/0x220 [i915]
? drm_ht_remove+0x20/0x20 [drm]
? drm_mm_remove_node+0x1ad/0x310 [drm]
? __pm_runtime_resume+0x54/0x70
? __i915_gem_object_unset_pages+0x129/0x170 [i915]
? __i915_gem_object_put_pages+0x70/0xa0 [i915]
? __i915_gem_free_objects+0x245/0x4e0 [i915]
? __switch_to_asm+0x24/0x60
? __i915_gem_free_work+0x65/0xa0 [i915]
? process_one_work+0x1fd/0x410
? worker_thread+0x49/0x3f0
? kthread+0xf8/0x130
? process_one_work+0x410/0x410
? kthread_park+0x90/0x90
? ret_from_fork+0x35/0x40
WARNING: CPU: 0 PID: 105 at native_smp_send_reschedule+0x34/0x40
---[ end trace 7dd2184d8c86cef5 ]---
------------[ cut here ]------------
sched: Unexpected reschedule of offline CPU#2!
WARNING: CPU: 0 PID: 105 at native_smp_send_reschedule+0x34/0x40
Modules linked in: netconsole ccm snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul crc32c_intel ghash_clmulni_intel arc4 iwldvm aesni_intel aes_x86_64 crypto_simd cryptd mac80211 glue_helper intel_cstate iwlwifi intel_uncore i915 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper cfbfillrect syscopyarea intel_ips cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea thinkpad_acpi prime_numbers cfg80211 ledtrig_audio i2c_i801 sg snd_hda_intel led_class snd_hda_codec drm ac drm_panel_orientation_quirks snd_hwdep battery e1000e agpgart snd_hda_core snd_pcm snd_timer ptp snd soundcore pps_core ehci_pci ehci_hcd lpc_ich video mfd_core button acpi_cpufreq ecryptfs ip_tables x_tables ipv6 evdev thermal [last unloaded: netconsole]
CPU: 0 PID: 105 Comm: kworker/u8:3 Tainted: G W 4.20.0-rc7b1+ #1
Hardware name: LENOVO 3680FBU/3680FBU, BIOS 6QET70WW (1.40 ) 10/11/2012
Workqueue: i915 __i915_gem_free_work [i915]
RIP: 0010:native_smp_send_reschedule+0x34/0x40
Code: 05 69 c6 c9 00 73 15 48 8b 05 18 2d b3 00 be fd 00 00 00 48 8b 40 30 e9 9a 58 7d 00 89 fe 48 c7 c7 78 73 af 81 e8 dc c2 01 00 <0f> 0b c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 8b 05 0d 7d df
RSP: 0018:ffff888075003d10 EFLAGS: 00010086
RAX: 000000000000002e RBX: ffff888075120740 RCX: 0000000000000006
RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff888075015440
RBP: ffff88807378b700 R08: 0000000000000000 R09: ffff888072fc07c0
R10: ffff888075003cd8 R11: 00000000ffeb4a02 R12: ffff888075120740
R13: ffff888075120740 R14: 0000000000000004 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffff888075000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdb1f53f000 CR3: 0000000001c0a004 CR4: 00000000000206f0
Call Trace:
<IRQ>
? check_preempt_curr+0x4e/0x90
? ttwu_do_wakeup.isra.19+0x14/0xf0
? try_to_wake_up+0x323/0x410
? __wake_up_common+0x8d/0x140
? ep_poll_callback+0xbd/0x2a0
? __wake_up_common+0x8d/0x140
? __wake_up_common_lock+0x6c/0x90
? irq_work_run_list+0x49/0x80
? tick_sched_handle.isra.6+0x50/0x50
? update_process_times+0x3b/0x50
? tick_sched_handle.isra.6+0x30/0x50
? tick_sched_timer+0x3b/0x80
? __hrtimer_run_queues+0xea/0x270
? hrtimer_interrupt+0x101/0x240
? smp_apic_timer_interrupt+0x6a/0x150
? apic_timer_interrupt+0xf/0x20
</IRQ>
? panic+0x1ca/0x212
? panic+0x1c7/0x212
? __iommu_flush_iotlb+0x19e/0x1c0
? iommu_flush_iotlb_psi+0x96/0xf0
? intel_unmap+0xbf/0xf0
? i915_gem_object_put_pages_gtt+0x36/0x220 [i915]
? drm_ht_remove+0x20/0x20 [drm]
---[ end trace 7dd2184d8c86cef6 ]---


Thanks. I barely use graphics and certainly not with KVM;
so I don't think I'll be missing anything igfx_off. But
maybe this bugreport can help other X201 users.