Re: [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer with error -12

From: Christian König
Date: Mon Jan 11 2021 - 15:46:56 EST


Hi Mike,

Am 11.01.21 um 20:23 schrieb Mikhail Gavrilov:
On Mon, 11 Jan 2021 at 19:01, Christian König <christian.koenig@xxxxxxx> wrote:

Changing the page table attributes while releasing memory might sleep.
So we can't use a spinlock here.

Thanks for the report, a patch to fix this is on the mailing list now.
Can you look also the first trace?

Unfortunately not, that's DC stuff. Easiest is to assign this as a bug tracker to our DC team.

Here a same error message "sleeping function called from invalid
context" and a lot of [amdgpu] code.

[SNIP]

-12 is just -ENOMEM. Looks like a memory leak to me, maybe caused by
the problem above, maybe something completely unrelated.

I will take a look.
The looks like a completely unrelated memory leak to me.

Probably best if you open up a bug report for this.
Yes, the monitor still turns off after applying patch "make the pool
shrinker lock a mutex".
Anyway patch fixed the issue with flood of message "BUG: sleeping
function called from invalid context at mm/vmalloc.c:1756" so kernel
log became cleaner.

At least some progress. Any objections that I add your e-mail address as tested-by tag?

Now the issue with turns off monitor looks in logs so:

DMA-API: cacheline tracking ENOMEM, dma-debug disabled
amdgpu 0000:0b:00.0: amdgpu: 000000006b791523 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin
framebuffer with error -12
BUG: kernel NULL pointer dereference, address: 0000000000000060
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 20 PID: 3780 Comm: brave:cs0 Tainted: G W ---------
--- 5.11.0-0.rc2.20210108gitf5e6c330254a.120.fc34.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 2802 10/21/2020
RIP: 0010:ttm_tt_swapin+0x34/0x1b0 [ttm]
Code: 55 41 54 55 53 48 83 ec 10 48 8b 47 20 48 89 44 24 08 48 85 c0
0f 84 86 01 00 00 48 8b 44 24 08 49 89 fc 4c 8b a8 e0 01 00 00 <41> 8b
45 60 89 44 24 04 8b 47 0c 85 c0 0f 84 df 00 00 00 31 db 65
RSP: 0018:ffffa7400532b9c0 EFLAGS: 00010286
RAX: ffff978e2ae25800 RBX: ffff97910ec12058 RCX: ffff978e12caac70
RDX: 0000000080000010 RSI: 0000000000000000 RDI: ffff97912c3d99c0
RBP: ffff97912c3d99c0 R08: 0000000000000000 R09: 0000000070b3a000
R10: 0000000000000002 R11: 0000000000000000 R12: ffff97912c3d99c0
R13: 0000000000000000 R14: ffffa7400532ba90 R15: ffff978e182c6350
FS: 00007f070bb1b640(0000) GS:ffff979509200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000060 CR3: 00000001f0cd2000 CR4: 0000000000350ee0
Call Trace:
ttm_tt_populate+0xa9/0xe0 [ttm]
ttm_bo_handle_move_mem+0x142/0x180 [ttm]
ttm_bo_validate+0x12e/0x1c0 [ttm]

I can take a look at this one here. Looks like some missing error handling when allocating memory.

Can you decode to which line number ttm_tt_swapin+0x34 points to?

[SNIP]

You said that I need open up a bug report you means site
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C75040f5053404b0f302b08d8b666769b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459898491581880%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IbkSfHK%2BD13OCcYMg%2BlNsZixi9gDEQEfS7Mxyf7vGdM%3D&amp;reserved=0 ?
I thought mailing lists is better because bug report on
bugzilla.kernel.org usually leave opened for several years without
attention.

Please use this one here: https://gitlab.freedesktop.org/drm/amd/-/issues/new

If you can't find the DC guys of hand in the assignee list just assign to me and I will forward.

But what you have in your logs so far are only unrelated symptoms, the root of the problem is that somebody is leaking memory.

What you could do as well is to try to enable kmemleak and maybe try some bleeding edge branch like drm-misc-fixes or Alex amd-staging-drm-next branch.

Thanks for the help,
Christian.