Re: [Bug][5.18-rc0] Between commits ed4643521e6a and 34af78c4e616, appears warning "WARNING: CPU: 31 PID: 51848 at drivers/dma-buf/dma-fence-array.c:191 dma_fence_array_create+0x101/0x120" and some games stopped working.

From: Christian König
Date: Fri Apr 08 2022 - 10:27:32 EST


Am 08.04.22 um 14:24 schrieb Mikhail Gavrilov:
On Fri, 8 Apr 2022 at 16:13, Christian König <christian.koenig@xxxxxxx> wrote:

I own you a beer.

I still don't know what happens here, but that makes at least a bit more
sense than a patch which only changes comments :)

Looks like we are missing something here. Can I send you a patch to try
something later today?
Yes, please feel free to send me a patch for testing.


Please test the attached patch, it just re-introduce the lock without doing much else.

And does your branch contain the following patch:

commit d18b8eadd83e3d8d63a45f9479478640dbcfca02
Author: Christian König <christian.koenig@xxxxxxx>
Date:   Wed Feb 23 14:35:31 2022 +0100

    drm/amdgpu: install ctx entities with cmpxchg

    Since we removed the context lock we need to make sure that not two threads
    are trying to install an entity at the same time.

    Signed-off-by: Christian König <christian.koenig@xxxxxxx>
    Fixes: 461fa7b0ac565e ("drm/amdgpu: remove ctx->lock")
    Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
    Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>

Thanks,
Christian.From e2e39cb1a4a1c7c0e3ff2e4e0188394b0eda0ba6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@xxxxxxx>
Date: Fri, 8 Apr 2022 16:22:55 +0200
Subject: [PATCH] drm/amdgpu: partial revert "remove ctx->lock"
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This reverts commit 461fa7b0ac565ef25c1da0ced31005dd437883a7.

We are missing some inter dependencies here.

Signed-off-by: Christian König <christian.koenig@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 1 +
3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 8de283997769..5471b93f6808 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -128,6 +128,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, union drm_amdgpu_cs
goto free_chunk;
}

+ mutex_lock(&p->ctx->lock);
+
/* skip guilty context job */
if (atomic_read(&p->ctx->guilty) == 1) {
ret = -ECANCELED;
@@ -688,6 +690,7 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
dma_fence_put(parser->fence);

if (parser->ctx) {
+ mutex_unlock(&parser->ctx->lock);
amdgpu_ctx_put(parser->ctx);
}
if (parser->bo_list)
@@ -1332,6 +1335,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
goto out;

r = amdgpu_cs_submit(&parser, cs);
+
out:
amdgpu_cs_parser_fini(&parser, r, reserved_buffers);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 5981c7d9bd48..8f0e6d93bb9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -237,6 +237,7 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev,

kref_init(&ctx->refcount);
spin_lock_init(&ctx->ring_lock);
+ mutex_init(&ctx->lock);

ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
ctx->reset_counter_query = ctx->reset_counter;
@@ -357,6 +358,7 @@ static void amdgpu_ctx_fini(struct kref *ref)
drm_dev_exit(idx);
}

+ mutex_destroy(&ctx->lock);
kfree(ctx);
}

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index d0cbfcea90f7..142f2f87d44c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -49,6 +49,7 @@ struct amdgpu_ctx {
bool preamble_presented;
int32_t init_priority;
int32_t override_priority;
+ struct mutex lock;
atomic_t guilty;
unsigned long ras_counter_ce;
unsigned long ras_counter_ue;
--
2.25.1