Re: [PATCH v5 2/2] drm/amd/display: add DMU timeout recovery support
From: Leo Li
Date: Tue May 05 2026 - 12:45:25 EST
On 2026-05-05 12:36, Leo Li wrote:
>
>
> On 2026-05-05 12:02, Leo Li wrote:
>>> + /*
>>> + * Compositors will refuse to make forward progress unless we send
>>> + * the previous flip's completion event.
>>> + */
>>> + if (WARN_ON(acrtc->event)) {
>>> + drm_crtc_send_vblank_event(&acrtc->base, acrtc->event);
>>> + drm_crtc_vblank_put(&acrtc->base);
>>> + }
>> I would expect this WARN_ON to occur only after the 10s flip_done timeout expires, allowing 'this' commit to progress with the previously armed acrtc->event and ->pflip_status from the previous commit ('this' commit would be gated by drm_atomic_helper_wait_for_dependencies).
>>
>> In which case, we probably want to apply the same above hunk for the cursor path here and also raise a warning: https://elixir.bootlin.com/linux/v6.19.3/source/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c#L10170
>
> Hmm, scratch that, looks like I didn't finish my own thought:
>
> I think this would also mean the compositor sent 'this' commit without waiting for the previous vblank event. IOW it's not that compositors refuse to make forward progress, but wait_for_dependencies() in kernel will wait for a flip_done completion that will never come. And that can be a long time, since the drm_crtc_commit_wait()s stack.
>
> Indeed, it seems to be the case in the dmesg log attached to this issue: https://gitlab.freedesktop.org/drm/amd/-/work_items/4809
>
> The back-to-back WARN_ONs are likely from the hunk above, and one level up at https://elixir.bootlin.com/linux/v7.0.1/source/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c#L10186. The compositor attempts another commit soon after, and hits the series of timeouts within wait_for_dependencies(), before we hit the next series of WARN_ONs from the same locations.
>
> So ultimately, I don't think sending the event here will do anything. Since by the time we hit that WARN_ON, we've already waited through wait_for_dependencies(). At which point, there aren't anymore waiters on it.
>
> I'm thinking an alternative would be to issue a 1-2s delayed worker whenever prepare_flip_isr() is called. The worker is canceled whenever the event is sent from the irq handlers. If it runs, then the expected pflip interrupt never fired, so we deliver the event in the worker. It's effectively a SW fallback for event delivery.
>
> Of course, it is only a fallback, ideally we figure out why interrupts were missed in the first place.
>
> - Leo
(Apologies for the spam)
But regarding this patch, I'm OK with dropping the above hunk but leaving the dm_helpers_dmu_timeout() implementation.
With that change, this is
Reviewed-by: Leo Li <sunpeng.li@xxxxxxx>