Re: drm/vkms: deadlock between dev->event_lock and timer

From: Linus Torvalds
Date: Wed Sep 13 2023 - 12:49:07 EST


On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Hello. A deadlock was reported in drivers/gpu/drm/vkms/ .
> It looks like this locking pattern remains as of 6.6-rc1. Please fix.
>
> void drm_crtc_vblank_off(struct drm_crtc *crtc) {
> spin_lock_irq(&dev->event_lock);
> drm_vblank_disable_and_save(dev, pipe) {
> __disable_vblank(dev, pipe) {
> crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank {
> hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds.
> ret = hrtimer_try_to_cancel(timer) {
> base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress.

Heh. Ok. This is clearly a bug, but it does seem to be limited to just
the vkms driver, and literally only to the "simulate vblank" case.

The worst part about it is that it's so subtle and not obvious.

Some light grepping seems to show that amdgpu has almost the exact
same pattern in its own vkms thing, except it uses

hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer);

directly, which presumably fixes the livelock, but means that the
cancel will fail if it's currently running.

So just doing the same thing in the vkms driver probably fixes things.

Maybe the vkms people need to add a flag to say "it's canceled" so
that it doesn't then get re-enabled? Or maybe it doesn't matter
and/or already happens for some reason I didn't look into.

Linus