Re: [PATCH v4] drm/i915: Synchronize active and retire callbacks

From: Joonas Lahtinen
Date: Mon Apr 20 2020 - 04:21:50 EST


Quoting Sultan Alsawaf (2020-04-20 08:24:19)
> Chris,
>
> Could you please look at this in earnest? This is a real bug that crashes my
> laptop without any kind of provocation. It is undeniably a bug in i915, and I've
> clearly described it in my patch. If you dont like the patch, I'm open to any
> suggestions you have for an alternative solution. My goal here is to make i915
> better, but it's difficult when communication only goes one way.

Hi Sultan,

The patch Chris pointed out was not part of 5.4 release. The commit
message describes that it fixes the functions to be tolerant to
running simultaneously. In doing that zeroing of ring->vaddr is
removed so the test to do mdelay(1) and "ring->vaddr = NULL;" is
not correct.

I think you might have used the wrong git command for checking the
patch history:

$ git describe a266bf420060
v5.4-rc7-1996-ga266bf420060 # after -rc7 tag

$ git describe --contains a266bf420060
v5.6-rc1~34^2~21^2~326 # included in v5.6-rc1

And git log to double check:

$ git log --format=oneline kernel.org/stable/linux-5.4.y --grep="drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint"
$ git log --format=oneline kernel.org/stable/linux-5.5.y --grep="drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint"
0725d9a31869e6c80630e99da366ede2848295cc drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint
$ git log --format=oneline kernel.org/stable/linux-5.6.y --grep="drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint"
a754012b9f2323a5d640da7eb7b095ac3b8cd012 drm/i915/execlists: Leave resetting ring to intel_ring
0725d9a31869e6c80630e99da366ede2848295cc drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint
a266bf42006004306dd48a9082c35dfbff153307 drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint

So it seems that the patch got pulled into v5.6 and has been backported
to v5.5 but not v5.4.

Could you try applying the patch to 5.4 and seeing if the problem
persists?

Regards, Joonas