Re: 3.14-rc7 crashes in drm ([PATCH] a crash in mga_driver_irq_uninstall)

From: Daniel Vetter
Date: Mon Mar 24 2014 - 04:57:03 EST


On Mon, Mar 24, 2014 at 07:45:47AM +1000, Dave Airlie wrote:
> On Mon, Mar 24, 2014 at 7:27 AM, Andreas Mohr <andi@xxxxxxxx> wrote:
> > On Sun, Mar 23, 2014 at 09:39:16AM -0700, Linus Torvalds wrote:
> >> On Sun, Mar 23, 2014 at 5:15 AM, Andreas Mohr <andi@xxxxxxxx> wrote:
> >> >
> >> > which did end up flawless on 3.12.0-rc2+, too
> >> > (but failed to improve the issue on 3.14.0-rc7+).
> >> >
> >> > So, for all intents and purposes, drm infrastructure seems unavoidably
> >> > (neither dri disable nor libdrm upgrade helps) affected.
> >> > Does anyone know which change caused that issue?
> >> > (I'm asking because bisect here would be relatively painful).
> >>
> >> So 3.12-rc2 works. Does 3.13 work? Is this a regression in the current
> >> 3.14 rc only, or did it happen already in the previous release?
> >
> > Hmm, given that Mikulas in
> > https://lkml.org/lkml/2014/2/26/537
> > offered a diff of linux-3.13.5 files, it truly seems (shock! ack! noo!)
> > that that indeed may have been a regression at <= 3.13 proper even
> > (which may pose interesting questions about the level of testing coverage
> > we still enjoy [not!?] in this hardware area).
> >
> > Oh well, seems I'll have to prepare/build 3.13 now...
>
> It's > 15 year old hardware, so yes I believe we have close to 0
> testing coverage on it outside of distros,
>
> I'm not even sure I have one anymore, I might be able to test an MGA in one box.

I haven't done a full read of all the related code, but this smells like a
similar bug I've hit all over the place in the i810 driver (another one of
those undead drm drivers of yonders). Ingredients:

1) Xorg creates a drm mapping of the register space.
2) Xorg tells the hw-specific drm which drm mapping has the hw registers,
and the driver uses that. Iirc this has been done as some form of OS
abstraction. Also note that these mappings aren't refcounted, so the first
guy to call drm_rmmap wins.

-> All hell breaks loose if Xorg dies and takes all it's mappings with it
(in master_destroy, since the Xorg /dev fd is the master) and leaves the
driver hanging in the air if there's an interrupt still pending (or
anything else fwiw).

In my case with i810 it was some subtle thing with error codes somewhere
else (iirc, been a while) which made Xorg fall over a bit differently and
so crash the kernel. Presuming this is the case I think we need a proper
bisect here to figure out the root-cause and re-apply the lost duct-tape.

"Properly" fixing the underlying bug for any sane definition of "proper"
is impossible - the legacy drm driver model is just too broken for that.
And for new-style drivers killing the irq support when they don't expect
it is not cool, since those drivers are sane and assume full control over
all hw interactions (not like their legacy bethren).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/