Re: "Hangcheck timer elapsed... GPU hung" in 3.8.0-rc2

From: J. Bruce Fields
Date: Thu Jan 03 2013 - 18:11:10 EST


On Thu, Jan 03, 2013 at 04:16:24PM -0500, Josh Boyer wrote:
> On Thu, Jan 3, 2013 at 3:46 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> > I got a crash after a few minutes of running 3.8.0-rc2, was able to
> > switch to a vt and look at dmesg:
> >
> > [ 490.962545] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> > [ 490.963019] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
> > [ 492.961446] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
> > [ 492.965613] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
> > [ 492.965621] [drm:i915_reset] *ERROR* Failed to reset chip.
> >
> > Previously I was on 3.6.10-2.fc17.x86_64, which didn't have any such
> > problem.
>
> I'm not questioning that you haven't seen that error in F17, but we have
> had quite a few bug reports with similar error messages for a while now.
> Apparently there are lots of ways GPUs can get hung, so they might be
> different from what you're seeing. Just wanted to point out that it
> might not be a new 3.8 change that caused it.

OK, sure. It reproduced very quickly after the upgrade, so I assumed it
was a regression.

I'm running 3.7.0 now which hasn't shown any problem.

I'll try a newer kernel again to see if it's really that easy for me to
reproduce.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/