Re: [git pull] drm for 6.10-rc1

From: Dave Airlie
Date: Wed May 15 2024 - 19:50:31 EST


On Thu, 16 May 2024 at 06:29, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 15 May 2024 at 13:24, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > I have to revert both
> >
> > a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality")
> > e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour")
> >
> > to make things build cleanly. Next step: see if it boots and fixes the
> > problem for me.
>
> Well, perhaps not surprisingly, the WARN_ON() no longer triggers with
> this, and everything looks fine.
>
> Let's see if the machine ends up being stable now. It took several
> hours for the "scary messages" state to turn into the "hung machine"
> state, so they *could* have been independent issues, but it seems a
> bit unlikely.

I think that should be fine to do for now.

I think it is also fine to do like I've attached, but I'm not sure if
I'd take that chance.

Two questions for Arunpravin (and Alex):

Is this fix correct, and can we get a good explanation of it?

Where did this error sneak in? Is the problem in the amdgpu tree, or
was it a drm-next only problem? If so perhaps we need to discuss
moving amdgpu more into drm-tip to catch this sort of problem.

Dave.
From 085b89278f296c40e86f5d1e1bcc1017c39f4002 Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied@xxxxxxxxxx>
Date: Thu, 16 May 2024 09:46:37 +1000
Subject: [PATCH] drm/buddy: convert WARN_ON to an if + continue

This WARN_ON triggers a lot, but I don't think the __force_merge
path always has to succeed, so just return a failure here instead
of warn on to let other paths handle the allocation.

(Not 100% sure on this patch - airlied).
---
drivers/gpu/drm/drm_buddy.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 284ebae71cc4..6b90ec6eefa8 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -195,8 +195,9 @@ static int __force_merge(struct drm_buddy *mm,
if (!drm_buddy_block_is_free(buddy))
continue;

- WARN_ON(drm_buddy_block_is_clear(block) ==
- drm_buddy_block_is_clear(buddy));
+ if (drm_buddy_block_is_clear(block) !=
+ drm_buddy_block_is_clear(buddy))
+ continue;

/*
* If the prev block is same as buddy, don't access the
--
2.44.0