Re: Regression in TTM driver w/Linus' master

From: Tobias Klausmann
Date: Fri Nov 24 2017 - 11:29:26 EST



On 11/24/17 4:35 PM, Christian KÃnig wrote:
Am 24.11.2017 um 16:17 schrieb Tobias Klausmann:

On 11/24/17 3:54 PM, Daniel Vetter wrote:
On Thu, Nov 23, 2017 at 03:24:38PM +0100, Tobias Klausmann wrote:
On 11/23/17 2:58 AM, Dave Airlie wrote:
On 23 November 2017 at 11:17, Laura Abbott <labbott@xxxxxxxxxx> wrote:
Hi,

Fedora QA testing reported a panic when booting up VMs
using qmeu vga drivers
(https://paste.fedoraproject.org/paste/498yRWTCJv2LKIrmj4EliQ)

[ÂÂ 30.108507] ------------[ cut here ]------------
[ÂÂ 30.108920] kernel BUG at ./include/linux/gfp.h:408!
[ÂÂ 30.109356] invalid opcode: 0000 [#1] SMP
[ÂÂ 30.109700] Modules linked in: fuse nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw
iptable_security ebtable_filter ebtables ip6table_filter ip6_tables
snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass
ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm bochs_drm ttm
joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm soundcore
parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 8021q garp
mrp stp llc virtio_net
[ÂÂ 30.115605]Â virtio_console virtio_scsi crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio
ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop
[ÂÂ 30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted
4.15.0-0.rc0.git6.1.fc28.x86_64 #1
[ÂÂ 30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.10.2-2.fc27 04/01/2014
[ÂÂ 30.118866] task: ffff923a77e03380 task.stack: ffffa78182228000
[ÂÂ 30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430
[ÂÂ 30.119810] RSP: 0000:ffffa7818222bba8 EFLAGS: 00010202
[ÂÂ 30.120250] RAX: 0000000000000001 RBX: 00000000014382c6 RCX:
0000000000000006
[ÂÂ 30.120840] RDX: 0000000000000000 RSI: 0000000000000009 RDI:
0000000000000000
[ÂÂ 30.121443] RBP: ffff923a760d6000 R08: 0000000000000000 R09:
0000000000000006
[ÂÂ 30.122039] R10: 0000000000000040 R11: 0000000000000300 R12:
ffff923a729273c0
[ÂÂ 30.122629] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff923a7483d400
[ÂÂ 30.123223] FS:Â 00007fe48da7dac0(0000) GS:ffff923a7cc00000(0000)
knlGS:0000000000000000
[ÂÂ 30.123896] CS:Â 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ÂÂ 30.124373] CR2: 00007fe457b73000 CR3: 0000000078313000 CR4:
00000000000006f0
[ÂÂ 30.124968] Call Trace:
[ÂÂ 30.125186]Â ttm_pool_populate+0x19b/0x400 [ttm]
[ÂÂ 30.125578]Â ttm_bo_vm_fault+0x325/0x570 [ttm]
[ÂÂ 30.125964]Â __do_fault+0x19/0x11e
[ÂÂ 30.126255]Â __handle_mm_fault+0xcd3/0x1260
[ÂÂ 30.126609]Â handle_mm_fault+0x14c/0x310
[ÂÂ 30.126947]Â __do_page_fault+0x28c/0x530
[ÂÂ 30.127282]Â do_page_fault+0x32/0x270
[ÂÂ 30.127593]Â async_page_fault+0x22/0x30
[ÂÂ 30.127922] RIP: 0033:0x7fe48aae39a8
[ÂÂ 30.128225] RSP: 002b:00007ffc21c4d928 EFLAGS: 00010206
[ÂÂ 30.128664] RAX: 00007fe457b73000 RBX: 000055cd4c1041a0 RCX:
00007fe457b73040
[ÂÂ 30.129259] RDX: 0000000000300000 RSI: 0000000000000000 RDI:
00007fe457b73000
[ÂÂ 30.129855] RBP: 0000000000000300 R08: 000000000000000c R09:
0000000100000000
[ÂÂ 30.130457] R10: 0000000000000001 R11: 0000000000000246 R12:
000055cd4c1041a0
[ÂÂ 30.131054] R13: 000055cd4bdfe990 R14: 000055cd4c104110 R15:
0000000000000400
[ÂÂ 30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 44 e9 0f
ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 fe ff ff
<0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c
[ÂÂ 30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP: ffffa7818222bba8
[ÂÂ 30.133836] ---[ end trace d4f1deb60784f40a ]---

This is based off of Linus' master branch at
c8a0739b185d11d6e2ca7ad9f5835841d1cfc765
Configs are at
https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/commit/?h=rawhide&id=0be14662c54f49b4e640868b9d67df18d39edff0

Looks like a TTM regression due to:

0284f1ead87463bc17cf5e81a24fc65c052486f3
drm/ttm: add transparent huge page support for cached allocations v2

If the driver requests dma32 pages, we can end up trying to alloc huge
dma32 pages which triggers the oops. The bochs driver always requests
dma32 here.

I'll send a rough patch once I boot it.

Dave.

Hi Dave,

fyi only: It looks like this is not the only regression in this cycle with
ttm, novueau seems to suffer as well [1].
Adding ttm folks. Might be useful if we have an entry for ttm in
MAINTAINERS ...
-Daniel


A bit more of investigation for the nouveau regression: This only show when Transparent Hugepages (CONFIG_TRANSPARENT_HUGEPAGE) are enable. Thanks Dave for pointing me to that!

Yeah, sorry for that. I missed to handle the DMA32 case with transparent huge page support.

Dave already came up with a fix which should be already submitted.

Christian.


Hi Christian,

no problem, rc1 isn't even released, so hitting bugs is expected! Yet Daves fix ("drm/ttm: don't attempt to use hugepages if dma32 requested (v2)" [1]) is not enough to fix the problem and a similiar backtrace to the one posted below in my first replay to this thread can be observed with Daves patch applied [2].


Greetings,

Tobias


[1] https://patchwork.freedesktop.org/patch/189812/

[2]:


[Â 171.559316] ------------[ cut here ]------------
[Â 171.559335] kernel BUG at mm/shmem.c:4334!
[Â 171.559342] invalid opcode: 0000 [#1] PREEMPT SMP
[Â 171.559344] Modules linked in: fuse rfcomm af_packet bnep uvcvideo rtsx_usb_ms arc4 memstick videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_core btusb btrtl btbcm msr nls_iso8859_1 nls_cp437 vfat fat joydev hid_multitouch snd_hda_codec_hdmi ath10k_pci ath10k_core ath iTCO_wdt iTCO_vendor_support mac80211 snd_hda_codec_realtek intel_rapl snd_hda_codec_generic cfg80211 r8169 mii x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul intel_wmi_thunderbolt crc32c_intel acer_wmi sparse_keymap ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep pcbc wmi_bmof snd_pcm snd_timer aesni_intel snd aes_x86_64 crypto_simd glue_helper cryptd pcspkr soundcore mei_me i2c_i801 mei thermal shpchp idma64 hci_uart serdev btqca intel_pch_thermal
[Â 171.559401]Â intel_lpss_pci btintel ucsi_acpi bluetooth typec_ucsi tps6598x ecdh_generic ac typec battery rfkill tpm_crb tpm_tis tpm_tis_core pinctrl_sunrisepoint intel_lpss_acpi pinctrl_intel intel_lpss tpm acpi_pad hid_generic usbhid rtsx_usb_sdmmc mmc_core rtsx_usb nouveau mxm_wmi ttm serio_raw i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt fb_sys_fops xhci_hcd usbcore drm i2c_hid wmi video button sg efivarfs
[Â 171.559437] CPU: 4 PID: 131 Comm: kworker/u16:2 Tainted: GÂÂÂÂÂÂÂ WÂÂÂÂÂÂÂ 4.14.0-desktop-rc0-debug+ #4
[Â 171.559439] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.06 07/05/2017
[Â 171.559445] Workqueue: ttm_swap ttm_shrink_work [ttm]
[Â 171.559448] task: ffff8f8c33b66400 task.stack: ffffa05fc1740000
[Â 171.559453] RIP: 0010:shmem_read_mapping_page_gfp+0x4c/0x50
[Â 171.559454] RSP: 0018:ffffa05fc1743d00 EFLAGS: 00010206
[Â 171.559457] RAX: ffff8f8c30f34580 RBX: ffff8f8bcec83400 RCX: fffff74489520f80
[Â 171.559459] RDX: 00000000014200ca RSI: 0000000000000000 RDI: ffff8f8c30f34360
[Â 171.559460] RBP: ffffa05fc1743d08 R08: 0000000000000000 R09: ffffffffffffffff
[Â 171.559462] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[Â 171.559463] R13: ffff8f8c30f34580 R14: ffff8f8c323b7538 R15: ffff8f8c34ee7800
[Â 171.559465] FS:Â 0000000000000000(0000) GS:ffff8f8c3ed00000(0000) knlGS:0000000000000000
[Â 171.559467] CS:Â 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Â 171.559469] CR2: 00005583f0c37dc0 CR3: 00000001abc10004 CR4: 00000000003606e0
[Â 171.559470] Call Trace:
[Â 171.559476]Â ttm_tt_swapout+0x156/0x280 [ttm]
[Â 171.559482]Â ttm_bo_swapout+0x217/0x260 [ttm]
[Â 171.559486]Â ? serio_raw_write+0x50/0x100 [serio_raw]
[Â 171.559491]Â ttm_shrink+0xab/0xe0 [ttm]
[Â 171.559496]Â ttm_shrink_work+0x14/0x20 [ttm]
[Â 171.559499]Â process_one_work+0x1e3/0x400
[Â 171.559501]Â ? process_one_work+0x17c/0x400
[Â 171.559506]Â worker_thread+0x30/0x3a0
[Â 171.559510]Â kthread+0x152/0x170
[Â 171.559512]Â ? process_one_work+0x400/0x400
[Â 171.559514]Â ? kthread_create_worker_on_cpu+0x40/0x40
[Â 171.559518]Â ret_from_fork+0x24/0x30
[Â 171.559523] Code: 8d 55 f8 6a 00 45 31 c9 b9 01 00 00 00 e8 6d f0 ff ff 85 c0 5a 59 74 04 48 98 c9 c3 48 8b 7d f8 e8 fa de fd ff 48 8b 45 f8 c9 c3 <0f> 0b 66 90 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb 48 83 ec
[Â 171.559567] RIP: shmem_read_mapping_page_gfp+0x4c/0x50 RSP: ffffa05fc1743d00
[Â 171.559570] ---[ end trace 3c2332c10029c3cd ]---
[Â 185.835630] swiotlb_tbl_map_single: 24 callbacks suppressed
[Â 185.835632] nouveau 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[Â 185.835635] swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
[Â 185.835637] CPU: 1 PID: 68 Comm: kworker/1:1 Tainted: GÂÂÂÂÂ D WÂÂÂÂÂÂÂ 4.14.0-desktop-rc0-debug+ #4
[Â 185.835638] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.06 07/05/2017
[Â 185.835642] Workqueue: pm pm_runtime_work
[Â 185.835644] Call Trace:
[Â 185.835648]Â dump_stack+0x8e/0xcb
[Â 185.835650]Â swiotlb_alloc_coherent+0xe4/0x160
[Â 185.835653]Â x86_swiotlb_alloc_coherent+0x3e/0x50
[Â 185.835659]Â ttm_dma_pool_get_pages+0x1e1/0x5d0 [ttm]
[Â 185.835664]Â ttm_dma_populate+0x244/0x330 [ttm]
[Â 185.835692]Â nouveau_ttm_tt_populate+0x142/0x1f0 [nouveau]
[Â 185.835695]Â ttm_tt_bind+0x23/0x50 [ttm]
[Â 185.835698]Â ttm_bo_handle_move_mem+0x570/0x5a0 [ttm]
[Â 185.835720]Â ? gf119_disp_chan_uevent_fini+0x60/0x70 [nouveau]
[Â 185.835724]Â ttm_bo_evict+0x146/0x340 [ttm]
[Â 185.835727]Â ? retint_kernel+0x2d/0x2d
[Â 185.835732]Â ttm_mem_evict_first+0x14c/0x1b0 [ttm]
[Â 185.835736]Â ttm_bo_force_list_clean+0x68/0x130 [ttm]
[Â 185.835739]Â ? pci_pm_runtime_resume+0xa0/0xa0
[Â 185.835741]Â ttm_bo_evict_mm+0x21/0x50 [ttm]
[Â 185.835762]Â nouveau_do_suspend+0x7c/0x2b0 [nouveau]
[Â 185.835780]Â nouveau_pmops_runtime_suspend+0x54/0xc0 [nouveau]
[Â 185.835783]Â pci_pm_runtime_suspend+0x5a/0x170
[Â 185.835785]Â ? pci_pm_runtime_resume+0xa0/0xa0
[Â 185.835787]Â __rpm_callback+0xb4/0x1e0
[Â 185.835789]Â ? pci_pm_runtime_resume+0xa0/0xa0
[Â 185.835791]Â rpm_callback+0x1f/0x80
[Â 185.835793]Â ? pci_pm_runtime_resume+0xa0/0xa0
[Â 185.835795]Â rpm_suspend+0x119/0x530
[Â 185.835797]Â ? pm_runtime_work+0x19/0xc0
[Â 185.835799]Â pm_runtime_work+0x76/0xc0
[Â 185.835802]Â process_one_work+0x1e3/0x400
[Â 185.835803]Â ? process_one_work+0x17c/0x400
[Â 185.835807]Â worker_thread+0x30/0x3a0
[Â 185.835809]Â kthread+0x152/0x170
[Â 185.835811]Â ? process_one_work+0x400/0x400
[Â 185.835813]Â ? kthread_create_worker_on_cpu+0x40/0x40
[Â 185.835815]Â ret_from_fork+0x24/0x30

...




Greetings,

Tobias


Greetings,

Tobias


[1]:


[Â 404.918139] ------------[ cut here ]------------
[Â 404.918147] kernel BUG at mm/shmem.c:4334!
[Â 404.918152] invalid opcode: 0000 [#2] PREEMPT SMP
[Â 404.918157] Modules linked in: rfcomm af_packet bnep uvcvideo
videobuf2_vmalloc videobuf2_memops rtsx_usb_ms videobuf2_v4l2 memstick
videodev videobuf2_core btusb btrtl btbcm arc4 msr snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 nls_cp437
hid_multitouch vfat fat iTCO_wdt iTCO_vendor_support intel_rapl
x86_pkg_temp_thermal intel_powerclamp ath10k_pci coretemp ath10k_core ath
kvm_intel mac80211 kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel r8169 mii snd_hda_intel pcbc cfg80211 snd_hda_codec
wmi_bmof snd_hda_core snd_hwdep acer_wmi sparse_keymap snd_pcm
intel_wmi_thunderbolt aesni_intel snd_timer aes_x86_64 crypto_simd
glue_helper cryptd snd soundcore idma64 pcspkr i2c_i801 mei_me shpchp
intel_pch_thermal mei intel_lpss_pci ucsi_acpi
[Â 404.918239]Â typec_ucsi thermal hci_uart serdev btqca tps6598x typec
btintel tpm_crb ac bluetooth tpm_tis intel_lpss_acpi tpm_tis_core
ecdh_generic battery pinctrl_sunrisepoint rfkill pinctrl_intel intel_lpss
tpm acpi_pad hid_generic usbhid rtsx_usb_sdmmc mmc_core rtsx_usb nouveau
mxm_wmi ttm serio_raw i915 i2c_algo_bit drm_kms_helper syscopyarea xhci_pci
sysfillrect sysimgblt fb_sys_fops xhci_hcd drm usbcore i2c_hid wmi video
button sg efivarfs
[Â 404.918289] CPU: 1 PID: 2739 Comm: Civ6 Tainted: G D
4.14.0-desktop-rc0-debug+ #1
[Â 404.918295] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.06
07/05/2017
[Â 404.918301] task: ffff9b3b49c7e280 task.stack: ffffb9a2035b4000
[Â 404.918308] RIP: 0010:shmem_read_mapping_page_gfp+0x4c/0x50
[Â 404.918313] RSP: 0018:ffffb9a2035b79c0 EFLAGS: 00010206
[Â 404.918317] RAX: ffff9b3b4dd17700 RBX: ffff9b3b759cb900 RCX:
ffffe4f10ac8b2c0
[Â 404.918322] RDX: 00000000014200ca RSI: 0000000000000000 RDI:
ffff9b3b4dd174e0
[Â 404.918327] RBP: ffffb9a2035b79c8 R08: 0000000000000000 R09:
ffffffffffffffff
[Â 404.918332] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[Â 404.918337] R13: ffff9b3b4dd17700 R14: ffff9b3b7253eb38 R15:
ffff9b3b75341000
[Â 404.918343] FS:Â 00007fa952f69700(0000) GS:ffff9b3b7ec40000(0000)
knlGS:0000000000000000
[Â 404.918348] CS:Â 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Â 404.918353] CR2: 0000560940b5d000 CR3: 0000000193e94004 CR4:
00000000003606e0
[Â 404.918358] Call Trace:
[Â 404.918365]Â ttm_tt_swapout+0x156/0x280 [ttm]
[Â 404.918371]Â ttm_bo_swapout+0x217/0x260 [ttm]
[Â 404.918379]Â ttm_shrink+0xab/0xe0 [ttm]
[Â 404.918384] ttm_mem_global_alloc_zone.constprop.6+0xd1/0x140 [ttm]
[Â 404.918391]Â ttm_mem_global_alloc+0x11/0x20 [ttm]
[Â 404.918397]Â ttm_bo_init_reserved+0x47/0x4f0 [ttm]
[Â 404.918403]Â ttm_bo_init+0x29/0xa0 [ttm]
[Â 404.918430]Â ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
[Â 404.918454]Â nouveau_bo_new+0x3f5/0x550 [nouveau]
[Â 404.918474]Â ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
[Â 404.918495]Â nouveau_gem_new+0x48/0x100 [nouveau]
[Â 404.918514]Â nouveau_gem_ioctl_new+0x45/0xc0 [nouveau]
[Â 404.918532]Â ? nouveau_gem_new+0x100/0x100 [nouveau]
[Â 404.918543]Â drm_ioctl_kernel+0x58/0xb0 [drm]
[Â 404.918551]Â drm_ioctl+0x315/0x3d0 [drm]
[Â 404.918568]Â ? nouveau_gem_new+0x100/0x100 [nouveau]
[Â 404.918576]Â ? trace_hardirqs_on+0xd/0x10
[Â 404.918595]Â nouveau_drm_ioctl+0x6d/0xb0 [nouveau]
[Â 404.918601]Â do_vfs_ioctl+0x8e/0x660
[Â 404.918605]Â ? __fget+0x102/0x1f0
[Â 404.918609]Â SyS_ioctl+0x74/0x80
[Â 404.918615]Â entry_SYSCALL_64_fastpath+0x23/0x9a
[Â 404.918619] RIP: 0033:0x7fa96396d2f7
[Â 404.918622] RSP: 002b:00007fa952f43728 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[Â 404.918628] RAX: ffffffffffffffda RBX: 00007fa93cad59c0 RCX:
00007fa96396d2f7
[Â 404.918633] RDX: 00007fa952f43780 RSI: 00000000c0306480 RDI:
0000000000000022
[Â 404.918638] RBP: 0000000000100000 R08: 0000000000000000 R09:
0000000000000000
[Â 404.918643] R10: 00007fa852fa0260 R11: 0000000000000246 R12:
00007fa91c053370
[Â 404.918648] R13: 0000000000100000 R14: 00007fa93e579a70 R15:
00007fa8530a1aa0
[Â 404.918656] Code: 8d 55 f8 6a 00 45 31 c9 b9 01 00 00 00 e8 6d f0 ff ff
85 c0 5a 59 74 04 48 98 c9 c3 48 8b 7d f8 e8 fa de fd ff 48 8b 45 f8 c9 c3
<0f> 0b 66 90 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb 48 83 ec
[Â 404.918698] RIP: shmem_read_mapping_page_gfp+0x4c/0x50 RSP:
ffffb9a2035b79c0
[Â 404.918711] ---[ end trace 53b254d8157cf0e7 ]---




_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel