Re: Time from regression report to a merge of a fix (was Re: [git pull] drm fixes for 7.0-rc1)
From: Dave Airlie
Date: Tue Feb 24 2026 - 01:51:15 EST
On Mon, 23 Feb 2026 at 22:52, Thorsten Leemhuis
<regressions@xxxxxxxxxxxxx> wrote:
>
> Lo!
>
> On 2/20/26 21:53, Dave Airlie wrote:
> >
> > This is the fixes and cleanups for the end of the merge window, it's
> > nearly all amdgpu, with some amdkfd, then a pagemap core fix, i915/xe
> > display fixes, and some xe driver fixes. Nothing seems out of the
> > ordinary, except amdgpu is a little more volume than usual.
> >
> > Let me know if there are any issues,
>
> Well, there were two fixes in here that made me wonder if our processes
> need some optimization to get regressions fixed at least somewhat as
> fast as Linus wants them to be fixed[1]:
>
> * One fix in here was for a amdgpu regression introduced in v6.19-rc6
> (and also affecting many stable series due to backports). The fix was
> ready within ~2 days and could even have made v6.19 -- but it only
> reached mainline through this PR on Friday. IOW: After two weeks. Which
> got me wondering, "Should we do something to merge fixes like that
> faster"? And yes, it's the merge window – but that's also when Arch
> Linux and openSUSE Tumbleweed usually jump to the latest mainline series
> and thus expose regressions like this to many users, so I guess it would
> be good to get them fixed at least as fast as outside of merge windows.
>
> * One fix in here was for a i915/xe regression introduced in v6.18-rc1.
> Once reported, it took about six weeks to get fixed – and then nearly 10
> days for the fix to reach mainline. Looking at this, I once more
> wondered if this could have been merged faster. But even more I wondered
> why the culprit wasn't reverted, as that's what Linus afaics wants when
> it takes this long.
>
> Note, these are examples of problems that happen in other subsystems as
> well; I chose to bring it up here just because they were good examples,
> as both regressions were also reported at least three times, so those
> are not really corner cases. See below for all the details.
>
> [1] "But if it's a regression with a known commit that caused it, I
> think the rule of thumb [to fix it] should generally be "within a week",
> preferably before the next rc."
> https://lore.kernel.org/all/CAHk-%3Dwi86AosXs66-yi54%2BmpQjPu0upxB8ZAfG%2BLsMyJmcuMSA@xxxxxxxxxxxxxx/
>
>
> > Mario Limonciello (2):
> > [...]
> > drm/amd: Fix hang on amdgpu unload by using pci_dev_is_disconnected()
>
> This is f7afda7fcd169a ("drm/amd: Fix hang on amdgpu unload by using
> pci_dev_is_disconnected()") [authored: 2026-02-05 17:42:54 GMT+1;
> committed: 2026-02-05 23:25:57 GMT+1 by Alex; next arrival:
> next-20260209; merged: 2026-02-21 00:36:38 GMT+1; v6.19-post].
>
> It fixes a regression that has been reported at least three times:
>
> * On Tue, 3 Feb 2026 17:27:00 -0500 (EST):
> https://lore.kernel.org/all/b0c22deb-c0fa-3343-33cf-fd9a77d7db99@xxxxxxxxxxxxxxxxxxx/
>
> * On February 5, 2026 at 1:30:12 PM GMT+1:
> https://gitlab.freedesktop.org/drm/amd/-/issues/4944
>
> * February 18, 2026 at 9:30:39 PM GMT+1:
> https://gitlab.freedesktop.org/drm/amd/-/issues/4984
>
> And likely a fourth time on February 7, 2026 at 7:25:40 PM GMT+1:
> https://gitlab.freedesktop.org/drm/amd/-/issues/4953
>
> The culprit is 28695ca09d3264 ("drm/amd: Clean up kfd node on surprise
> disconnect") [also known as 6a23e7b4332c10; authored: 2026-01-07
> 22:37:28; committed: 2026-01-14 20:51:36; next arrival: next-20260119;
> merged: 2026-01-16 22:48:18; v6.19-rc6 (2026-01-19 00:42:45), v6.18.7
> (2026-01-23 11:21:37), v6.12.67 (2026-01-23 11:18:52), v6.6.122
> (2026-01-30 10:27:43)]
>
> Mario and Alex thus had a fix ready and committed within about two days
> after it was first reported. It thus is an "immediate fix" (yeah!), just
> how Linus wants it (see [1] above).
>
> But then it took two weeks to get it mainlined -- and will now take a
> few days more to reach all those stable trees where it is needed, too.
>
> Give the dates above it could have reached 6.19 (released 2026-02-08
> 22:03:27 GMT+1) if we really had wanted to.
>
> That fix could also have made the main drm PR this merge window (send
> Wed, 11 Feb 2026 17:26:03 +1000:), as Alex already asked for merging on
> Fri, 6 Feb 2026 14:27:06 -0500:
> https://lore.kernel.org/all/CAPM=9tzgmO1PWeuxjAxqOmS5PTsOe8jHP9Poy23q6tvY66B1KQ@xxxxxxxxxxxxxx/
> https://lore.kernel.org/all/20260206192706.59396-1-alexander.deucher@xxxxxxx/
>
> If it made that pull, the fix could be in stable already by now. Maybe
> Alex PR just fell through the cracks. Happens, but overall this still
> made me wonder:
>
> (1) Should there maybe have been an additional PR this merge window to
> speed things up? Or some fast track for regressions?
>
> (2) Or should the fix (or a revert of the culprit) maybe even have been
> sent to Linus for 6.19? That would have saved at least one user from
> bisecting and reporting the regression (and likely a few others that
> never reported it).
>
> From Linus' mail I linked above, I'd assume he would have preferred the
> second option here, even if it would have been a last minute fix. If so:
> how could we make that happen more often in the future?
>
> Side note: yes, unbinding a module is likely something only a few users
> do -- but given those three or four reports, it seems it's not that
> unusual. And I don't care too much about this specific fix anyway, as
> it's just an example for the "time it takes fixes for recent regressions
> to reach mainline" aspect that I see all the time in many subsystems. To
> elaborate on that, let me give another example:
>
> > Imre Deak (2):
> > drm/i915/dp: Fix pipe BPP clamping due to HDR
>
> This is now fe26ae6ac8b88f ("drm/i915/dp: Fix pipe BPP clamping due to
> HDR") [authored: 2026-02-09 14:38:16 GMT+1; committed: 2026-02-12
> 07:03:08 GMT+1; next arrival: next-20260212; merged: 2026-02-21 00:36:38
> GMT+1; v6.19-post].
>
> That commit fixes a regressions that has been reported at least three times:
>
> * On December 30, 2025 at 5:07:48 PM GMT+1
> https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15503
>
> * On January 13, 2026 at 11:51:11 PM GMT+1
> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7052
>
> * On February 15, 2026 at 10:13:48 PM GMT+1
> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7269
>
> That regression is caused by ba49a4643cf53c ("drm/i915/dp: Set min_bpp
> limit to 30 in HDR mode") [authored: 2025-07-30 07:55:23 GMT+;
> committed: 2025-08-19 08:32:40 GMT+; next arrival: next-20250820;
> merged: 2025-10-02 21:47:25 GMT+; v6.18-rc1 (2025-10-12 22:42:36 GMT+)].
>
> The regression took way longer to get resolved than the first example,
> which makes me wonder:
>
> (1) Should the culprit have been reverted weeks ago to get closer to the
> "immediate fix" target that Linus wants?
>
> (2) This fix also took nine days from being committed to reaching
> mainline. It came a bit too late for the first drm PR this cycle. So
> again: Would more frequent PRs help here? Or some fast-track path for
> regression fixes?
>
> Ciao, Thorsten