Re: [Intel-gfx] [KERNEL] Regression bug in drm/i915, Wrong assumption in commit e11aa36 breaks suspend on at least lenovo x61

From: Dave Gordon
Date: Thu Feb 19 2015 - 05:48:09 EST


On 18/02/15 16:24, Imre Deak wrote:
> On ke, 2015-02-18 at 17:39 +0200, Jani Nikula wrote:
>> On Tue, 17 Feb 2015, Klaus Ethgen <Klaus+lkml@xxxxxxxxx> wrote:
>>> After solving the conflicts, I applied the revert (see attachment) to
>>> v3.18.7. I think it should also apply to the current head. With that
>>> patch, suspend is working again on that version.
>>>
>>> However, I have not to deep knowledge of that subsystem, so please,
>>> someone who have, have a deeper look into it. I especially do not know
>>> if the lines in .../intel_pm.c are correct or better leaving them as
>>> they are in v3.18.7.
>>>
>>> I want to have it working on a version that I know is stable before
>>> asking to pull it to head.
>>
>> Hi Klaus, we fear this patch may hide the actual cause. Would be useful
>> to get a better description of what happens, along with a dmesg with
>> drm.debug=14 module parameter set. This might clutter the mailing list,
>> would you mind filing a bug at [1] and attach the info there?
>>
>> Thanks,
>> Jani.
>>
>> [1] https://bugs.freedesktop.org/enter_bug.cgi?product=DRI&component=DRM/Intel
>
> In addition to the above could you also try the following patch and
> provide a dmesg log on the bugzilla ticket - at this point only to try
> to narrow down the issue?:
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index d358ce8..02c65f4 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -4466,6 +4466,14 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
> I915_DISPLAY_PLANE_A_FLIP_PENDING_INTERRUPT |
> I915_DISPLAY_PLANE_B_FLIP_PENDING_INTERRUPT;
>
> + if (!intel_irqs_enabled(dev_priv)) {
> + if (printk_ratelimit())
> + DRM_ERROR("spurious/shared interrupt iir %08x imr %08x ier %08x\n",
> + I915_READ(IIR), I915_READ(IMR), I915_READ(IER));
> +
> + return IRQ_NONE;
> + }
> +
> iir = I915_READ(IIR);
>
> for (;;) {
> @@ -4766,7 +4774,10 @@ void intel_runtime_pm_disable_interrupts(struct drm_device *dev)
> struct drm_i915_private *dev_priv = dev->dev_private;
>
> dev->driver->irq_uninstall(dev);
> +
> + spin_lock_irq(&dev_priv->irq_lock);
> dev_priv->pm._irqs_disabled = true;
> + spin_unlock_irq(&dev_priv->irq_lock);
> }
>
> /* Restore interrupts so we can recover from runtime PM. */
> @@ -4774,7 +4785,10 @@ void intel_runtime_pm_restore_interrupts(struct drm_device *dev)
> {
> struct drm_i915_private *dev_priv = dev->dev_private;
>
> + spin_lock_irq(&dev_priv->irq_lock);
> dev_priv->pm._irqs_disabled = false;
> + spin_unlock_irq(&dev_priv->irq_lock);
> +
> dev->driver->irq_preinstall(dev);
> dev->driver->irq_postinstall(dev);
> }

Surely surrounding (what ought to be) an atomic assignment to a single
variable cannot make a difference? Unless it's the memory barrier
semantics that have some effect? It seems unlikely that the compiler has
deferred the write to the variable past the pre/postinstall calls that
actually enable the h/w interrupts, but maybe we should add "volatile"
just in case?

.Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/