Re: [git pull] drm merge for 3.9-rc1
From: Chris Wilson
Date: Wed Feb 27 2013 - 05:04:27 EST
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie <airlied@xxxxxxxx> wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro horrors cleaned up, killing lots of legacy GTT
> > code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [ 8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new, but the black screen is also new and does seem
> to happen when I get the message, so...
That message appears to be the canary. For whatever reason the DP
transfer is not functioning, likely the VDD is not powered up. However,
the failure to communicate there causes the modeset to abort, resulting
in the blank screen.
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
So it is reproducible during suspend. That should help narrow down the
sequence, thank you.
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
The hardware is required to return a timedout error message after 400
microseconds. The timeout here is to catch the dysfunction driver, and
so was intended to be 10 milliseconds, cf
https://patchwork.kernel.org/patch/2160541/
As it happens with your machine 10 jiffies is approximately 10
millisecond, and so we should not be aborting before the hardware has
had a chance to signal failure. One way to check whether it is a failure
to setup the IRQ or a failure to setup the DP comms would be:
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 7b8bfe8..f2486f1 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -356,9 +356,11 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp, bool has_aux_irq)
done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
else
done = wait_for_atomic(C, 10) == 0;
- if (!done)
- DRM_ERROR("dp aux hw did not signal timeout (has irq: %i)!\n",
- has_aux_irq);
+ if (!done) {
+ status = I915_READ_NOTRACE(ch_ctl);
+ DRM_ERROR("dp aux hw did not signal timeout (has irq: %i), status=%08x!\n",
+ has_aux_irq, status);
+ }
#undef C
return status;
--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/