Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

From: Christian König
Date: Thu Jan 21 2021 - 10:05:08 EST


I still have no idea what's going on here.

The KASAN messages from the DC code are completely unrelated.

Please add the full dmesg to your bug report.

Christian.

Am 20.01.21 um 01:59 schrieb Mikhail Gavrilov:
On Fri, 15 Jan 2021 at 03:43, Mikhail Gavrilov
<mikhail.v.gavrilov@xxxxxxxxx> wrote:
In rc4, the number of warnings has dropped dramatically.
No more errors "kasan slab-out-of-bounds" and no "DMA-API device
driver failed to check map error".
But still not fixed "sleeping function called from invalid context at
include/linux/sched/mm.h:196" and "BUG: key ffff88810b0d9148 has not
been registered!"
Second issue Navi specific because it started to happen in 5.10 kernel
after replacing Radeon VII to 6900XT.

1.
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:196
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 500, name: systemd-udevd
1 lock held by systemd-udevd/500:
#0: ffff888107690258 (&dev->mutex){....}-{3:3}, at:
device_driver_attach+0xa3/0x250
CPU: 9 PID: 500 Comm: systemd-udevd Not tainted
5.11.0-0.rc4.129.fc34.x86_64+debug #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
Call Trace:
dump_stack+0xae/0xe5
___might_sleep.cold+0x150/0x17e
? dcn30_clock_source_create+0x53/0x110 [amdgpu]
kmem_cache_alloc_trace+0x23f/0x270
dcn30_clock_source_create+0x53/0x110 [amdgpu]
dcn30_create_resource_pool+0x998/0x4890 [amdgpu]
? dcn30_calc_max_scaled_time+0x40/0x40 [amdgpu]
? lock_is_held_type+0xb8/0xf0
? unpoison_range+0x3a/0x60
? ____kasan_kmalloc.constprop.0+0x84/0xa0
? dc_create_resource_pool+0x26e/0x5e0 [amdgpu]
dc_create_resource_pool+0x26e/0x5e0 [amdgpu]
dc_create+0x636/0x1bc0 [amdgpu]
? lock_acquire+0x2dd/0x7a0
? sched_clock+0x5/0x10
? sched_clock_cpu+0x18/0x170
? find_held_lock+0x33/0x110
? dc_create_state+0xa0/0xa0 [amdgpu]
? lock_downgrade+0x6b0/0x6b0
? module_assert_mutex_or_preempt+0x3e/0x70
? lock_is_held_type+0xb8/0xf0
? unpoison_range+0x3a/0x60
? ____kasan_kmalloc.constprop.0+0x84/0xa0
amdgpu_dm_init.isra.0+0x479/0x640 [amdgpu]
? vprintk_emit+0x1c0/0x460
? dev_vprintk_emit+0x2d8/0x31a
? sched_clock+0x5/0x10
? dm_resume+0x13b0/0x13b0 [amdgpu]
? dev_attr_show.cold+0x35/0x35
? lock_downgrade+0x6b0/0x6b0
? dev_printk_emit+0x8c/0xa8
? dev_vprintk_emit+0x31a/0x31a
? wait_for_completion_io+0x240/0x240
? __dev_printk+0x71/0xdf
? smu_hw_init.cold+0x16b/0x18a [amdgpu]
? smu_suspend+0x240/0x240 [amdgpu]
? navi10_ih_irq_init+0xea3/0x2420 [amdgpu]
dm_hw_init+0xe/0x20 [amdgpu]
amdgpu_device_init.cold+0x3031/0x4940 [amdgpu]
? amdgpu_device_cache_pci_state+0xf0/0xf0 [amdgpu]
? pci_bus_read_config_byte+0x140/0x140
? do_pci_enable_device+0x1f8/0x260
? pci_find_saved_ext_cap+0x110/0x110
? pci_enable_bridge+0xf9/0x1e0
? pci_dev_check_d3cold+0x107/0x250
? pci_enable_device_flags+0x201/0x340
amdgpu_driver_load_kms+0x167/0x8a0 [amdgpu]
amdgpu_pci_probe+0x235/0x360 [amdgpu]
? amdgpu_pci_remove+0xd0/0xd0 [amdgpu]
local_pci_probe+0xd8/0x170
pci_device_probe+0x318/0x5c0
? kernfs_create_link+0x16c/0x230
? pci_device_remove+0x1d0/0x1d0
really_probe+0x224/0xc40
driver_probe_device+0x1f2/0x380
device_driver_attach+0x1df/0x250
__driver_attach+0xf6/0x260
? device_driver_attach+0x250/0x250
bus_for_each_dev+0x114/0x180
? subsys_dev_iter_exit+0x10/0x10
bus_add_driver+0x352/0x570
driver_register+0x20f/0x390
? __pci_register_driver+0x13a/0x210
? 0xffffffffc1d8d000
do_one_initcall+0xfb/0x530
? perf_trace_initcall_level+0x3d0/0x3d0
? __memset+0x2b/0x30
? unpoison_range+0x3a/0x60
do_init_module+0x1ce/0x7a0
load_module+0x9841/0xa380
? module_frob_arch_sections+0x20/0x20
? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
? sched_clock_cpu+0x18/0x170
? sched_clock+0x5/0x10
? lock_acquire+0x2dd/0x7a0
? sched_clock+0x5/0x10
? lock_is_held_type+0xb8/0xf0
? __do_sys_init_module+0x18b/0x220
__do_sys_init_module+0x18b/0x220
? load_module+0xa380/0xa380
? ktime_get_coarse_real_ts64+0x12f/0x160
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f2c109da07e
Code: 48 8b 0d f5 1d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d c2 1d 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc84d33f88 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055b87f8260a0 RCX: 00007f2c109da07e
RDX: 000055b87f834060 RSI: 0000000001e2cbf6 RDI: 00007f2c0b7e0010
RBP: 00007f2c0b7e0010 R08: 000055b87f8281e0 R09: 00007ffc84d30a26
R10: 000055bd2404cc18 R11: 0000000000000246 R12: 000055b87f834060
R13: 000055b87f831ca0 R14: 0000000000000000 R15: 000055b87f832640
[drm] Display Core initialized with v3.2.116!
[drm] DMUB hardware initialized: version=0x02000001
usb 1-3.2: Device not responding to setup address.
usb 1-3.2: device not accepting address 5, error -71
[drm] REG_WAIT timeout 1us * 100000 tries - mpc2_assert_idle_mpcc line:480


2.
BUG: key ffff88810b0d9148 has not been registered!
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 25 PID: 500 at kernel/locking/lockdep.c:4618
lockdep_init_map_waits+0x592/0x770
Modules linked in: amdgpu(+) drm_ttm_helper ttm iommu_v2 gpu_sched
drm_kms_helper cec crct10dif_pclmul crc32_pclmul crc32c_intel drm
ghash_clmulni_intel ccp igb nvme dca nvme_core i2c_algo_bit xhci_pci
xhci_pci_renesas wmi pinctrl_amd fuse
CPU: 25 PID: 500 Comm: systemd-udevd Tainted: G W
--------- --- 5.11.0-0.rc4.129.fc34.x86_64+debug #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
RIP: 0010:lockdep_init_map_waits+0x592/0x770
Code: 08 84 d2 0f 85 d8 01 00 00 8b 3d e1 02 38 04 85 ff 0f 85 7e fc
ff ff 48 c7 c6 e0 04 ca 8e 48 c7 c7 40 fd c9 8e e8 01 8e 23 02 <0f> 0b
e9 64 fc ff ff 48 89 df 44 89 4c 24 0c 44 89 44 24 08 48 89
RSP: 0018:ffffc900029bef88 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000537de7
RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8886f9fe72ab
R10: ffffed10df3fce55 R11: 0000000000000001 R12: ffff88810b0d9148
R13: 0000000000000000 R14: ffffffff8edbda60 R15: ffff88810b0db690
FS: 00007f2c0fdda140(0000) GS:ffff8886f9e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b8800aec68 CR3: 0000000127fd0000 CR4: 0000000000350ee0
Call Trace:
? lockdep_hardirqs_on+0x75/0xf0
__kernfs_create_file+0x102/0x2f0
sysfs_add_file_mode_ns+0x1af/0x500
sysfs_create_bin_file+0x100/0x160
? lock_is_held_type+0xb8/0xf0
? sysfs_add_file_to_group+0x150/0x150
? static_obj+0x8a/0xc0
? lockdep_init_map_waits+0x2a2/0x770
hdcp_create_workqueue+0x879/0xb50 [amdgpu]
amdgpu_dm_init.isra.0.cold+0x7f2/0x374c [amdgpu]
? vprintk_emit+0x140/0x460
? dev_vprintk_emit+0x2d8/0x31a
? sched_clock+0x5/0x10
? dm_resume+0x13b0/0x13b0 [amdgpu]
? dev_attr_show.cold+0x35/0x35
? psp_set_srm+0x250/0x250 [amdgpu]
? hdcp_update_display+0x5b0/0x5b0 [amdgpu]
? lock_downgrade+0x6b0/0x6b0
? dev_printk_emit+0x8c/0xa8
? dev_vprintk_emit+0x31a/0x31a
? wait_for_completion_io+0x240/0x240
? __dev_printk+0x71/0xdf
? smu_hw_init.cold+0x16b/0x18a [amdgpu]
? smu_suspend+0x240/0x240 [amdgpu]
? navi10_ih_irq_init+0xea3/0x2420 [amdgpu]
dm_hw_init+0xe/0x20 [amdgpu]
amdgpu_device_init.cold+0x3031/0x4940 [amdgpu]
? amdgpu_device_cache_pci_state+0xf0/0xf0 [amdgpu]
? pci_bus_read_config_byte+0x140/0x140
? do_pci_enable_device+0x1f8/0x260
? pci_find_saved_ext_cap+0x110/0x110
? pci_enable_bridge+0xf9/0x1e0
? pci_dev_check_d3cold+0x107/0x250
? pci_enable_device_flags+0x201/0x340
amdgpu_driver_load_kms+0x167/0x8a0 [amdgpu]
amdgpu_pci_probe+0x235/0x360 [amdgpu]
? amdgpu_pci_remove+0xd0/0xd0 [amdgpu]
local_pci_probe+0xd8/0x170
pci_device_probe+0x318/0x5c0
? kernfs_create_link+0x16c/0x230
? pci_device_remove+0x1d0/0x1d0
really_probe+0x224/0xc40
driver_probe_device+0x1f2/0x380
device_driver_attach+0x1df/0x250
__driver_attach+0xf6/0x260
? device_driver_attach+0x250/0x250
bus_for_each_dev+0x114/0x180
? subsys_dev_iter_exit+0x10/0x10
bus_add_driver+0x352/0x570
driver_register+0x20f/0x390
? __pci_register_driver+0x13a/0x210
? 0xffffffffc1d8d000
do_one_initcall+0xfb/0x530
? perf_trace_initcall_level+0x3d0/0x3d0
? __memset+0x2b/0x30
? unpoison_range+0x3a/0x60
do_init_module+0x1ce/0x7a0
load_module+0x9841/0xa380
? module_frob_arch_sections+0x20/0x20
? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
? sched_clock_cpu+0x18/0x170
? sched_clock+0x5/0x10
? lock_acquire+0x2dd/0x7a0
? sched_clock+0x5/0x10
? lock_is_held_type+0xb8/0xf0
? __do_sys_init_module+0x18b/0x220
__do_sys_init_module+0x18b/0x220
? load_module+0xa380/0xa380
? ktime_get_coarse_real_ts64+0x12f/0x160
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f2c109da07e
Code: 48 8b 0d f5 1d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d c2 1d 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc84d33f88 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055b87f8260a0 RCX: 00007f2c109da07e
RDX: 000055b87f834060 RSI: 0000000001e2cbf6 RDI: 00007f2c0b7e0010
RBP: 00007f2c0b7e0010 R08: 000055b87f8281e0 R09: 00007ffc84d30a26
R10: 000055bd2404cc18 R11: 0000000000000246 R12: 000055b87f834060
R13: 000055b87f831ca0 R14: 0000000000000000 R15: 000055b87f832640
irq event stamp: 593331
hardirqs last enabled at (593331): [<ffffffff8c3602f0>]
console_unlock+0x7c0/0x9a0
hardirqs last disabled at (593330): [<ffffffff8c3601e8>]
console_unlock+0x6b8/0x9a0
softirqs last enabled at (593162): [<ffffffff8e801112>]
asm_call_irq_on_stack+0x12/0x20
softirqs last disabled at (593157): [<ffffffff8e801112>]
asm_call_irq_on_stack+0x12/0x20
---[ end trace 37dc3a4a3aa1704a ]---

Issue with the switching off monitor still happens too, but messages
in logs become more detailed:
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!
amdgpu 0000:0b:00.0: amdgpu: 0000000087613007 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -4!

I hope "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the
buffer list -4!" gives an idea of what happened.

Full kernel log is here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FnX69zgvf&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cdee77ab7d3c04b44adda08d8bcdebcfe%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637467012155850822%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=J6TiqMBHrrZyNolxaUgKo4%2BNa6kBCBytrs1bJhqzGuU%3D&amp;reserved=0