Re: Regression in TTM driver w/Linus' master

From: Tobias Klausmann
Date: Fri Nov 24 2017 - 10:17:34 EST



On 11/24/17 3:54 PM, Daniel Vetter wrote:
On Thu, Nov 23, 2017 at 03:24:38PM +0100, Tobias Klausmann wrote:
On 11/23/17 2:58 AM, Dave Airlie wrote:
On 23 November 2017 at 11:17, Laura Abbott <labbott@xxxxxxxxxx> wrote:
Hi,

Fedora QA testing reported a panic when booting up VMs
using qmeu vga drivers
(https://paste.fedoraproject.org/paste/498yRWTCJv2LKIrmj4EliQ)

[ 30.108507] ------------[ cut here ]------------
[ 30.108920] kernel BUG at ./include/linux/gfp.h:408!
[ 30.109356] invalid opcode: 0000 [#1] SMP
[ 30.109700] Modules linked in: fuse nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw
iptable_security ebtable_filter ebtables ip6table_filter ip6_tables
snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass
ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm bochs_drm ttm
joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm soundcore
parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 8021q garp
mrp stp llc virtio_net
[ 30.115605] virtio_console virtio_scsi crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio
ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop
[ 30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted
4.15.0-0.rc0.git6.1.fc28.x86_64 #1
[ 30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.10.2-2.fc27 04/01/2014
[ 30.118866] task: ffff923a77e03380 task.stack: ffffa78182228000
[ 30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430
[ 30.119810] RSP: 0000:ffffa7818222bba8 EFLAGS: 00010202
[ 30.120250] RAX: 0000000000000001 RBX: 00000000014382c6 RCX:
0000000000000006
[ 30.120840] RDX: 0000000000000000 RSI: 0000000000000009 RDI:
0000000000000000
[ 30.121443] RBP: ffff923a760d6000 R08: 0000000000000000 R09:
0000000000000006
[ 30.122039] R10: 0000000000000040 R11: 0000000000000300 R12:
ffff923a729273c0
[ 30.122629] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff923a7483d400
[ 30.123223] FS: 00007fe48da7dac0(0000) GS:ffff923a7cc00000(0000)
knlGS:0000000000000000
[ 30.123896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 30.124373] CR2: 00007fe457b73000 CR3: 0000000078313000 CR4:
00000000000006f0
[ 30.124968] Call Trace:
[ 30.125186] ttm_pool_populate+0x19b/0x400 [ttm]
[ 30.125578] ttm_bo_vm_fault+0x325/0x570 [ttm]
[ 30.125964] __do_fault+0x19/0x11e
[ 30.126255] __handle_mm_fault+0xcd3/0x1260
[ 30.126609] handle_mm_fault+0x14c/0x310
[ 30.126947] __do_page_fault+0x28c/0x530
[ 30.127282] do_page_fault+0x32/0x270
[ 30.127593] async_page_fault+0x22/0x30
[ 30.127922] RIP: 0033:0x7fe48aae39a8
[ 30.128225] RSP: 002b:00007ffc21c4d928 EFLAGS: 00010206
[ 30.128664] RAX: 00007fe457b73000 RBX: 000055cd4c1041a0 RCX:
00007fe457b73040
[ 30.129259] RDX: 0000000000300000 RSI: 0000000000000000 RDI:
00007fe457b73000
[ 30.129855] RBP: 0000000000000300 R08: 000000000000000c R09:
0000000100000000
[ 30.130457] R10: 0000000000000001 R11: 0000000000000246 R12:
000055cd4c1041a0
[ 30.131054] R13: 000055cd4bdfe990 R14: 000055cd4c104110 R15:
0000000000000400
[ 30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 44 e9 0f
ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 fe ff ff
<0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c
[ 30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP: ffffa7818222bba8
[ 30.133836] ---[ end trace d4f1deb60784f40a ]---

This is based off of Linus' master branch at
c8a0739b185d11d6e2ca7ad9f5835841d1cfc765
Configs are at
https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/commit/?h=rawhide&id=0be14662c54f49b4e640868b9d67df18d39edff0

Looks like a TTM regression due to:

0284f1ead87463bc17cf5e81a24fc65c052486f3
drm/ttm: add transparent huge page support for cached allocations v2

If the driver requests dma32 pages, we can end up trying to alloc huge
dma32 pages which triggers the oops. The bochs driver always requests
dma32 here.

I'll send a rough patch once I boot it.

Dave.

Hi Dave,

fyi only: It looks like this is not the only regression in this cycle with
ttm, novueau seems to suffer as well [1].
Adding ttm folks. Might be useful if we have an entry for ttm in
MAINTAINERS ...
-Daniel


A bit more of investigation for the nouveau regression: This only show when Transparent Hugepages (CONFIG_TRANSPARENT_HUGEPAGE) are enable. Thanks Dave for pointing me to that!


Greetings,

Tobias


Greetings,

Tobias


[1]:


[Â 404.918139] ------------[ cut here ]------------
[Â 404.918147] kernel BUG at mm/shmem.c:4334!
[Â 404.918152] invalid opcode: 0000 [#2] PREEMPT SMP
[Â 404.918157] Modules linked in: rfcomm af_packet bnep uvcvideo
videobuf2_vmalloc videobuf2_memops rtsx_usb_ms videobuf2_v4l2 memstick
videodev videobuf2_core btusb btrtl btbcm arc4 msr snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 nls_cp437
hid_multitouch vfat fat iTCO_wdt iTCO_vendor_support intel_rapl
x86_pkg_temp_thermal intel_powerclamp ath10k_pci coretemp ath10k_core ath
kvm_intel mac80211 kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel r8169 mii snd_hda_intel pcbc cfg80211 snd_hda_codec
wmi_bmof snd_hda_core snd_hwdep acer_wmi sparse_keymap snd_pcm
intel_wmi_thunderbolt aesni_intel snd_timer aes_x86_64 crypto_simd
glue_helper cryptd snd soundcore idma64 pcspkr i2c_i801 mei_me shpchp
intel_pch_thermal mei intel_lpss_pci ucsi_acpi
[Â 404.918239]Â typec_ucsi thermal hci_uart serdev btqca tps6598x typec
btintel tpm_crb ac bluetooth tpm_tis intel_lpss_acpi tpm_tis_core
ecdh_generic battery pinctrl_sunrisepoint rfkill pinctrl_intel intel_lpss
tpm acpi_pad hid_generic usbhid rtsx_usb_sdmmc mmc_core rtsx_usb nouveau
mxm_wmi ttm serio_raw i915 i2c_algo_bit drm_kms_helper syscopyarea xhci_pci
sysfillrect sysimgblt fb_sys_fops xhci_hcd drm usbcore i2c_hid wmi video
button sg efivarfs
[Â 404.918289] CPU: 1 PID: 2739 Comm: Civ6 Tainted: G D
4.14.0-desktop-rc0-debug+ #1
[Â 404.918295] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.06
07/05/2017
[Â 404.918301] task: ffff9b3b49c7e280 task.stack: ffffb9a2035b4000
[Â 404.918308] RIP: 0010:shmem_read_mapping_page_gfp+0x4c/0x50
[Â 404.918313] RSP: 0018:ffffb9a2035b79c0 EFLAGS: 00010206
[Â 404.918317] RAX: ffff9b3b4dd17700 RBX: ffff9b3b759cb900 RCX:
ffffe4f10ac8b2c0
[Â 404.918322] RDX: 00000000014200ca RSI: 0000000000000000 RDI:
ffff9b3b4dd174e0
[Â 404.918327] RBP: ffffb9a2035b79c8 R08: 0000000000000000 R09:
ffffffffffffffff
[Â 404.918332] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[Â 404.918337] R13: ffff9b3b4dd17700 R14: ffff9b3b7253eb38 R15:
ffff9b3b75341000
[Â 404.918343] FS:Â 00007fa952f69700(0000) GS:ffff9b3b7ec40000(0000)
knlGS:0000000000000000
[Â 404.918348] CS:Â 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Â 404.918353] CR2: 0000560940b5d000 CR3: 0000000193e94004 CR4:
00000000003606e0
[Â 404.918358] Call Trace:
[Â 404.918365]Â ttm_tt_swapout+0x156/0x280 [ttm]
[Â 404.918371]Â ttm_bo_swapout+0x217/0x260 [ttm]
[Â 404.918379]Â ttm_shrink+0xab/0xe0 [ttm]
[Â 404.918384]Â ttm_mem_global_alloc_zone.constprop.6+0xd1/0x140 [ttm]
[Â 404.918391]Â ttm_mem_global_alloc+0x11/0x20 [ttm]
[Â 404.918397]Â ttm_bo_init_reserved+0x47/0x4f0 [ttm]
[Â 404.918403]Â ttm_bo_init+0x29/0xa0 [ttm]
[Â 404.918430]Â ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
[Â 404.918454]Â nouveau_bo_new+0x3f5/0x550 [nouveau]
[Â 404.918474]Â ? nouveau_bo_invalidate_caches+0x10/0x10 [nouveau]
[Â 404.918495]Â nouveau_gem_new+0x48/0x100 [nouveau]
[Â 404.918514]Â nouveau_gem_ioctl_new+0x45/0xc0 [nouveau]
[Â 404.918532]Â ? nouveau_gem_new+0x100/0x100 [nouveau]
[Â 404.918543]Â drm_ioctl_kernel+0x58/0xb0 [drm]
[Â 404.918551]Â drm_ioctl+0x315/0x3d0 [drm]
[Â 404.918568]Â ? nouveau_gem_new+0x100/0x100 [nouveau]
[Â 404.918576]Â ? trace_hardirqs_on+0xd/0x10
[Â 404.918595]Â nouveau_drm_ioctl+0x6d/0xb0 [nouveau]
[Â 404.918601]Â do_vfs_ioctl+0x8e/0x660
[Â 404.918605]Â ? __fget+0x102/0x1f0
[Â 404.918609]Â SyS_ioctl+0x74/0x80
[Â 404.918615]Â entry_SYSCALL_64_fastpath+0x23/0x9a
[Â 404.918619] RIP: 0033:0x7fa96396d2f7
[Â 404.918622] RSP: 002b:00007fa952f43728 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[Â 404.918628] RAX: ffffffffffffffda RBX: 00007fa93cad59c0 RCX:
00007fa96396d2f7
[Â 404.918633] RDX: 00007fa952f43780 RSI: 00000000c0306480 RDI:
0000000000000022
[Â 404.918638] RBP: 0000000000100000 R08: 0000000000000000 R09:
0000000000000000
[Â 404.918643] R10: 00007fa852fa0260 R11: 0000000000000246 R12:
00007fa91c053370
[Â 404.918648] R13: 0000000000100000 R14: 00007fa93e579a70 R15:
00007fa8530a1aa0
[Â 404.918656] Code: 8d 55 f8 6a 00 45 31 c9 b9 01 00 00 00 e8 6d f0 ff ff
85 c0 5a 59 74 04 48 98 c9 c3 48 8b 7d f8 e8 fa de fd ff 48 8b 45 f8 c9 c3
<0f> 0b 66 90 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb 48 83 ec
[Â 404.918698] RIP: shmem_read_mapping_page_gfp+0x4c/0x50 RSP:
ffffb9a2035b79c0
[Â 404.918711] ---[ end trace 53b254d8157cf0e7 ]---




_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel