Re: WARNING: AMDGPU DRM warning in 5.11.9

From: Christian König
Date: Thu Mar 25 2021 - 04:30:18 EST


Hi,

Am 25.03.21 um 09:17 schrieb Oleksandr Natalenko:
Hello.

On Thu, Mar 25, 2021 at 07:57:33AM +0200, Ilkka Prusi wrote:
On 24.3.2021 16.16, Chris Rankin wrote:
Hi,

Theee warnings ares not present in my dmesg log from 5.11.8:

[ 43.390159] ------------[ cut here ]------------
[ 43.393574] WARNING: CPU: 2 PID: 1268 at
drivers/gpu/drm/ttm/ttm_bo.c:517 ttm_bo_release+0x172/0x282 [ttm]
[ 43.401940] Modules linked in: nf_nat_ftp nf_conntrack_ftp cfg80211
Changing WARN_ON to WARN_ON_ONCE in drivers/gpu/drm/ttm/ttm_bo.c
ttm_bo_release() reduces the flood of messages into single splat.

This warning appears to come from 57fcd550eb15bce ("drm/ttm: Warn on pinning
without holding a reference)" and reverting it might be one choice.


There are others, but I am assuming there is a common cause here.

Cheers,
Chris

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index a76eb2c14e8c..50b53355b265 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -514,7 +514,7 @@ static void ttm_bo_release(struct kref *kref)
* shrinkers, now that they are queued for
* destruction.
*/
- if (WARN_ON(bo->pin_count)) {
+ if (WARN_ON_ONCE(bo->pin_count)) {
bo->pin_count = 0;
ttm_bo_del_from_lru(bo);
ttm_bo_add_mem_to_lru(bo, &bo->mem);



--
- Ilkka

WARN_ON_ONCE() will just hide the underlying problem. Do we know why
this happens at all?

The patch was incorrectly back ported to 5.11 without also porting the driver changes to not trigger this warning back as well.

We are probably going to revert it for 5.11.10.

Regards,
Christian.


Same for me, BTW, with v5.11.9:

```
[~]> lspci | grep VGA
0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7)

[ 3676.033140] ------------[ cut here ]------------
[ 3676.033153] WARNING: CPU: 7 PID: 1318 at drivers/gpu/drm/ttm/ttm_bo.c:517 ttm_bo_release+0x375/0x500 [ttm]

[ 3676.033340] Hardware name: ASUS System Product Name/Pro WS X570-ACE, BIOS 3302 03/05/2021

[ 3676.033469] Call Trace:
[ 3676.033473] ttm_bo_move_accel_cleanup+0x1ab/0x3a0 [ttm]
[ 3676.033478] amdgpu_bo_move+0x334/0x860 [amdgpu]
[ 3676.033580] ttm_bo_validate+0x1f1/0x2d0 [ttm]
[ 3676.033585] amdgpu_cs_bo_validate+0x9b/0x1c0 [amdgpu]
[ 3676.033665] amdgpu_cs_list_validate+0x115/0x150 [amdgpu]
[ 3676.033743] amdgpu_cs_ioctl+0x873/0x20a0 [amdgpu]
[ 3676.033960] drm_ioctl_kernel+0xb8/0x140 [drm]
[ 3676.033977] drm_ioctl+0x222/0x3c0 [drm]
[ 3676.034071] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 3676.034145] __x64_sys_ioctl+0x83/0xb0
[ 3676.034149] do_syscall_64+0x33/0x40

[ 3676.034171] ---[ end trace 66e9865b027112f3 ]---
```

Thanks.