Re: [PATCH] drm/rockchip: Allow driver to be shutdown on reboot/kexec

From: Brian Norris
Date: Wed Dec 05 2018 - 12:42:19 EST


Hi,

On Wed, Dec 05, 2018 at 02:28:48PM +0000, Marc Zyngier wrote:
> On 05/12/2018 14:11, Heiko Stübner wrote:
> > Am Mittwoch, 5. Dezember 2018, 04:01:34 CET schrieb Brian Norris:
> >> On Sun, Aug 05, 2018 at 01:48:07PM +0100, Marc Zyngier wrote:
> >>> Leaving the DRM driver enabled on reboot or kexec has the annoying
> >>> effect of leaving the display generating transactions whilst the
> >>> IOMMU has been shut down.
> >>>
> >>> In turn, the IOMMU driver (which shares its interrupt line with
> >>> the VOP) starts warning either on shutdown or when entering the
> >>> secondary kernel in the kexec case (nothing is expected on that
> >>> front).
> >>>
> >>> A cheap way of ensuring that things are nicely shut down is to
> >>> register a shutdown callback in the platform driver.
> >>>
> >>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
> >>> ---
> >>
> >> This patch made it into 4.20-rc1 as well as -stable, and it has caused
> >> regressions for me, on the Kevin and Scarlet [1] RK3399 platforms.
> >>
> >> On
> >> shutdown/reboot, I see this:
> >>
> >> [ 94.742559] WARNING: CPU: 4 PID: 2035 at
> >> drivers/gpu/drm/drm_mode_config.c:477 drm_mode_config_cleanup+0x1c4/0x294
> >> ...
...
> >> Anyway, the above warnings occur on v4.20-rc, which I think is
> >> justification enough for a revert.
> >
> > That's strange. I remember testing quite a number of shutdown/reboot
> > cycles before applying that patch. And for good measure did the same
> > again right now.
> >
> > - Kevin, with netboot firmware, booting into Debian+console only
> > - Bob, with stock firmware, booting into Debian+KDE Plasma
> > - Scarlet, with stock firmware, booting into Debian+KDE Plasma
> >
> > With some random number of reboot and shutdowns on each I didn't
> > see any warnings at all.
>
> And I've been using this very patch for quite a while now.
>
> Before suggesting a revert, I'd rather we understand what is going on,
> and why is the DRM layer crapping itself that badly for a legitimate
> operation (it is certainly better to have a shutdown than to let the VOP
> scan out crap once the IOMMU has been shut down). In short, don't shoot
> the messenger.

I honestly don't know much at all about DRM. But I do see this problem
on 4.19.y also (and probably 4.14.y), now that this patch was included
there.

I'm fine with trying to "fix forward" in mainline, but unfortunately,
it's usually quite difficult to get Greg to drop things from -stable,
especially when the regression is already pushed to a release. That's
why I'd propose a revert first, which can be sent back to -stable while
things are figured out.

I'm also willing to test any updates, if you have better suggestions.

> >> I plan to submit a revert which I hope can go to 4.20 as well as
> >> -stable. I'd hope the remove()/shutdown() paths should be fixed before
> >> this gets applied again, and that it does not get shipped to -stable
> >> kernels.
> >
> > But judging by the fact that the warning indicates that somthing is still
> > holding onto a framebuffer and a rmmod rockchipdrm is not possible
> > at runrtime for likely the same reason, I guess we really might be creating
> > a problem with that shutdown.
>
> That's a potential root cause.
>
> >
> > Can you maybe give "drm/rockchip: shutdown drm subsystem on shutdown" [2]
> > a try? When the underlying issue of rebooting surfaced we had 2 competing
> > solutions, so we at least don't reopen the issue, that people have problems
> > rebooting?

I'll try to give that a spin.

> kexec working is certainly something I need. And I'd like to understand
> why Brian sees this and nobody else.

For one, I'm actually running Chrome OS. My tests currently don't have
the full Chrome UI working, since Chrome OS has some basic graphics API
requirements and there's no Mali GPU driver upstream (so I get relegated
to our splash screen and console manager, frecon, instead). But some
people have a software-rendered llvmpipe backend working, and they
likely would see the same problem.

Maybe common Linux distros treat "no GPU" too simplistically and don't
really exercise the DRM framework much. I dunno.

Brian