Re: The issue with page allocation 5.3 rc1-rc2 (seems drm culprit here)

From: Mikhail Gavrilov
Date: Sat Aug 10 2019 - 12:08:35 EST


On Fri, 9 Aug 2019 at 23:55, Mikhail Gavrilov
<mikhail.v.gavrilov@xxxxxxxxx> wrote:
> Finally initial problem "gnome-shell: page allocation failure:
> order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
> nodemask=(null),cpuset=/,mems_allowed=0" did not happens anymore with
> latest version of the patch (I tested more than 23 hours)
>
> But I hit a new problem:
>
> [73808.088801] ------------[ cut here ]------------
> [73808.088806] DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock)
> [73808.088813] WARNING: CPU: 8 PID: 1348877 at
> kernel/locking/mutex.c:757 __ww_mutex_lock.constprop.0+0xb0f/0x10c0

[pruned]

> So I needed to report it separately (in another thread) or we continue here?

Today after reboot issue "DEBUG LOCKS
WARN_ON(ww_ctx->contending_lock)" happened again.

--
Best Regards,
Mike Gavrilov.

[ 5406.584851] ------------[ cut here ]------------
[ 5406.584855] DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock)
[ 5406.584862] WARNING: CPU: 2 PID: 4865 at kernel/locking/mutex.c:757 __ww_mutex_lock.constprop.0+0xb0f/0x10c0
[ 5406.584865] Modules linked in: macvtap macvlan tap rfcomm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio kvm_amd snd_hda_codec_hdmi snd_hda_intel kvm rtwpci snd_hda_codec rtw88 irqbypass snd_hda_core snd_usb_audio mac80211 snd_usbmidi_lib crct10dif_pclmul uvcvideo snd_hwdep snd_rawmidi crc32_pclmul btusb videobuf2_vmalloc videobuf2_memops snd_seq videobuf2_v4l2 btrtl btbcm ghash_clmulni_intel snd_seq_device btintel videobuf2_common xpad eeepc_wmi joydev ff_memless
[ 5406.584895] bluetooth cfg80211 snd_pcm asus_wmi videodev snd_timer sparse_keymap video wmi_bmof snd ecdh_generic mc ecc soundcore ccp k10temp sp5100_tco rfkill libarc4 i2c_piix4 gpio_amdpt gpio_generic acpi_cpufreq binfmt_misc ip_tables hid_logitech_hidpp amdgpu crc32c_intel amd_iommu_v2 gpu_sched ttm drm_kms_helper igb drm nvme dca hid_logitech_dj i2c_algo_bit nvme_core wmi pinctrl_amd
[ 5406.584915] CPU: 2 PID: 4865 Comm: firefox:cs0 Not tainted 5.3.0-0.rc3.git1.2.fc31.x86_64 #1
[ 5406.584917] Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 2406 06/21/2019
[ 5406.584920] RIP: 0010:__ww_mutex_lock.constprop.0+0xb0f/0x10c0
[ 5406.584922] Code: 28 00 74 28 e8 42 29 a6 ff 85 c0 74 1f 8b 05 f8 6a e0 00 85 c0 75 15 48 c7 c6 70 35 32 92 48 c7 c7 f0 67 30 92 e8 e9 84 5c ff <0f> 0b 4d 89 74 24 28 b8 dd ff ff ff 65 48 8b 14 25 40 8e 01 00 48
[ 5406.584924] RSP: 0018:ffffb738cca4f760 EFLAGS: 00010286
[ 5406.584926] RAX: 0000000000000000 RBX: ffff8e1732e13300 RCX: 0000000000000000
[ 5406.584927] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000246
[ 5406.584929] RBP: ffffb738cca4f820 R08: 0000000000000000 R09: 0000000000000000
[ 5406.584931] R10: ffffffff93d3f740 R11: 0000000093d3f373 R12: ffffb738cca4fb90
[ 5406.584932] R13: ffffb738cca4f7c0 R14: ffff8e172e0fb258 R15: ffff8e172e0fb260
[ 5406.584934] FS: 00007fc2d5c6b700(0000) GS:ffff8e18ba400000(0000) knlGS:0000000000000000
[ 5406.584935] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5406.584937] CR2: 00007ff54bbd0000 CR3: 00000005ad12a000 CR4: 00000000003406e0
[ 5406.584938] Call Trace:
[ 5406.584943] ? _raw_spin_unlock_irq+0x29/0x40
[ 5406.584951] ? ttm_mem_evict_first+0x1ed/0x4f0 [ttm]
[ 5406.584955] ? ww_mutex_lock_interruptible+0x43/0xb0
[ 5406.584957] ww_mutex_lock_interruptible+0x43/0xb0
[ 5406.584961] ttm_mem_evict_first+0x1ed/0x4f0 [ttm]
[ 5406.584969] ttm_bo_mem_space+0x229/0x2c0 [ttm]
[ 5406.584974] ttm_bo_validate+0xe5/0x190 [ttm]
[ 5406.584979] ? lockdep_hardirqs_on+0xf0/0x180
[ 5406.585033] amdgpu_cs_bo_validate+0xaa/0x1b0 [amdgpu]
[ 5406.585082] amdgpu_cs_validate+0x3b/0x260 [amdgpu]
[ 5406.585131] amdgpu_cs_list_validate+0x110/0x180 [amdgpu]
[ 5406.585184] amdgpu_cs_ioctl+0x5a9/0x1d10 [amdgpu]
[ 5406.585189] ? sched_clock+0x5/0x10
[ 5406.585247] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[ 5406.585260] drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 5406.585271] drm_ioctl+0x208/0x390 [drm]
[ 5406.585316] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[ 5406.585319] ? sched_clock_cpu+0xc/0xc0
[ 5406.585322] ? lockdep_hardirqs_on+0xf0/0x180
[ 5406.585366] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 5406.585371] do_vfs_ioctl+0x411/0x750
[ 5406.585375] ksys_ioctl+0x5e/0x90
[ 5406.585378] __x64_sys_ioctl+0x16/0x20
[ 5406.585381] do_syscall_64+0x5c/0xb0
[ 5406.585385] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 5406.585387] RIP: 0033:0x7fc30b31007b
[ 5406.585390] Code: 0f 1e fa 48 8b 05 0d 9e 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 9d 0c 00 f7 d8 64 89 01 48
[ 5406.585392] RSP: 002b:00007fc2d5c6a118 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 5406.585394] RAX: ffffffffffffffda RBX: 00007fc2d5c6aaf0 RCX: 00007fc30b31007b
[ 5406.585396] RDX: 00007fc2d5c6aaf0 RSI: 00000000c0186444 RDI: 000000000000001f
[ 5406.585397] RBP: 00007fc2d5c6aa70 R08: 00007fc2d5c6ad10 R09: 0000000000000030
[ 5406.585398] R10: 00007fc2d5c6ad10 R11: 0000000000000246 R12: 00000000c0186444
[ 5406.585400] R13: 000000000000001f R14: 00007fc2d5c6aaf0 R15: 000000000000001f
[ 5406.585404] irq event stamp: 156924947
[ 5406.585406] hardirqs last enabled at (156924947): [<ffffffff91b24d19>] _raw_spin_unlock_irq+0x29/0x40
[ 5406.585408] hardirqs last disabled at (156924946): [<ffffffff91b1d188>] __schedule+0xc8/0x900
[ 5406.585411] softirqs last enabled at (156923776): [<ffffffff91e0035d>] __do_softirq+0x35d/0x45d
[ 5406.585414] softirqs last disabled at (156923765): [<ffffffff910f1e37>] irq_exit+0xf7/0x100
[ 5406.585415] ---[ end trace 2b58e1013c283539 ]---
[ 7417.187431] Youngblood_x64v (14657) used greatest stack depth: 10504 bytes left
[ 7417.190872] Youngblood_x64v (14654) used greatest stack depth: 10248 bytes left
[11664.949437] nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based firewall rule not found. Use the iptables CT target to attach helpers instead.
[15298.698293] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[19656.044113] show_signal_msg: 1 callbacks suppressed