Re: drm/mgag200: doesn't work in panic context

From: Rui Wang
Date: Tue Jun 30 2015 - 03:41:27 EST


On Tuesday, June 30, 2015 2:37 PM, Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> On Tue, Jun 30, 2015 at 4:53 AM, Rui Wang <rui.y.wang@xxxxxxxxx> wrote:
> >
> > I think testing can be done by injecting a fatal machine check
> > exception via einj's debugfs interface. I can reproduce the hard hang every
> time.
> > I think It can be a simple script or C program do to the automated testing.
> > If anyone has any patch I'll be happy to help test it out.
>
> Testing shouldn't kill the machine ;-)

Yes :) What I assumed was that after applying a future patch the machine should
be able to reboot instead of hanging itself, so the testing can repeat.

>
> The idea I had is to just exercise the drm panic code (since we'd need to
> shunt everything else), and that can be done my calling the relevant
> functions from a hardirq context. And hardirq context is simples to get with a
> IPI to the local cpu. This way we don't depend upon the entire panic path to
> be recoverable, but only upon the drm bits being sane.

Yes If it can be tested without rebooting then it'll be more efficient.
But einj does something more than what an IPI can do, it injects hardware
errors which trigger exceptions in NMI context... and the exception handler
usually panics on fatal errors. And the display may be the only way to catch
what has happened. I'm just hoping that the future version may work in NMI
context.

Thanks
Rui

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/