Re:Re: [PATCH 6.18.y] drm/vkms: Fix ABBA deadlock in vblank disable and timer callback
From: w15303746062
Date: Tue May 26 2026 - 07:17:31 EST
Hi Maarten,
>As far as I can tell, if it's just a bug affecting vkms, all you need to do
>is only a few commits:
>
>74afeb812850 ("drm/vblank: Add vblank timer")
>d54dbb5963bd ("drm/vblank: Add CRTC helpers for simple use cases")
>02e2681ffe1a ("drm/vkms: Convert to DRM's vblank timer")
>79ae8510b5b8 ("drm/atomic: Increase timeout in drm_atomic_helper_wait_for_vblanks()")
>3946d3ba9934 ("drm/vblank: Fix kernel docs for vblank timer")
>
>There's no need to convert all other drivers if it's only vkms that you're fixing.
Thank you very much for pointing out this precise dependency chain. It completely saved the backport effort. I have cherry-picked these 5 commits onto the 6.18.y branch, and they apply cleanly without pulling in the massive DRM core refactoring.
This series completely resolves the Syzkaller RCU stall (soft lockup) I was observing in my local fuzzing environment. I have just submitted this 5-patch series to the list.
>But since you found this bug in one driver, it might be wise to check if others
>have the same bug and ask for backports for those too.
Following your suggestion, I conducted a static lock dependency audit across the drivers/gpu/drm/ subsystem in the 6.18.y tree, specifically looking for similar abuses of hrtimer_cancel paired with custom vblank/polling timers.
I audited the highly suspicious candidates, including:
1. i915/gvt (virtual display emulation: vblank_timer_fn vs intel_vgpu_clean_display)
2. xe (OA buffer polling: xe_oa_poll_check_timer_cb vs xe_oa_stream_disable)
3. msm (fence deadlines & devfreq: deadline_timer vs msm_update_fence)
Fortunately, these drivers are structurally safe from this specific ABBA deadlock pattern. They successfully avoid it either by heavily decoupling the timer callback from the lock context via workqueues (msm_fence and i915/gvt only use the timer to safely wake_up or queue work without holding mutexes/spinlocks), or by utilizing fine-grained locking where the cancel path and the timer callback do not contest the same lock (xe stream polling).
Therefore, it seems vkms was a unique legacy outlier in this regard. No further backports are needed for other DRM drivers for this specific vulnerability.
Thanks again for the roadmap and the thorough review.
Best regards,
Mingyu