Re: [PATCH] drm/rockchip: Allow driver to be shutdown on reboot/kexec
From: Marc Zyngier
Date: Wed Dec 05 2018 - 09:28:54 EST
Hi all,
On 05/12/2018 14:11, Heiko StÃbner wrote:
> Hi Brian,
>
> Am Mittwoch, 5. Dezember 2018, 04:01:34 CET schrieb Brian Norris:
>> + others
>>
>> Hi,
>>
>> On Sun, Aug 05, 2018 at 01:48:07PM +0100, Marc Zyngier wrote:
>>> Leaving the DRM driver enabled on reboot or kexec has the annoying
>>> effect of leaving the display generating transactions whilst the
>>> IOMMU has been shut down.
>>>
>>> In turn, the IOMMU driver (which shares its interrupt line with
>>> the VOP) starts warning either on shutdown or when entering the
>>> secondary kernel in the kexec case (nothing is expected on that
>>> front).
>>>
>>> A cheap way of ensuring that things are nicely shut down is to
>>> register a shutdown callback in the platform driver.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
>>> ---
>>
>> This patch made it into 4.20-rc1 as well as -stable, and it has caused
>> regressions for me, on the Kevin and Scarlet [1] RK3399 platforms.
>>
>> On
>> shutdown/reboot, I see this:
>>
>> [ 94.742559] WARNING: CPU: 4 PID: 2035 at
>> drivers/gpu/drm/drm_mode_config.c:477 drm_mode_config_cleanup+0x1c4/0x294
>> ...
>> [ 94.775904] CPU: 4 PID: 2035 Comm: reboot Tainted: G W
>> 4.20.0-rc5+ #83 [ 94.784651] Hardware name: Google Scarlet (DT)
>> [ 94.789611] pstate: 20000005 (nzCv daif -PAN -UAO)
>> [ 94.794959] pc : drm_mode_config_cleanup+0x1c4/0x294
>> [ 94.800500] lr : drm_mode_config_cleanup+0x108/0x294
>> ...
>> [ 94.898683] Call trace:
>> [ 94.901410] drm_mode_config_cleanup+0x1c4/0x294
>> [ 94.906565] rockchip_drm_unbind+0x4c/0x8c
>> [ 94.911138] component_master_del+0x88/0xb8
>> [ 94.915807] rockchip_drm_platform_remove+0x2c/0x44
>> [ 94.921243] rockchip_drm_platform_shutdown+0x20/0x2c
>> [ 94.926881] platform_drv_shutdown+0x2c/0x38
>> [ 94.931647] device_shutdown+0x164/0x1b8
>> [ 94.936016] kernel_restart_prepare+0x40/0x48
>> [ 94.940878] kernel_restart+0x20/0x68
>> [ 94.944964] __se_sys_reboot+0x1ac/0x204
>> [ 94.949331] __arm64_sys_reboot+0x2c/0x38
>> [ 94.953806] el0_svc_common+0xa4/0xec
>> [ 94.957891] el0_svc_compat_handler+0x30/0x3c
>> [ 94.962753] el0_svc_compat+0x8/0x18
>> [ 94.966740] ---[ end trace b9ba2e701f4fb233 ]---
>> [ 95.255169] Memory manager not clean during takedown.
>> [ 95.260824] WARNING: CPU: 4 PID: 2035 at drivers/gpu/drm/drm_mm.c:950
>> drm_mm_takedown+0x34/0x44 ...
>> [ 95.292314] CPU: 4 PID: 2035 Comm: reboot Tainted: G W
>> 4.20.0-rc5+ #83 [ 95.301061] Hardware name: Google Scarlet (DT)
>> [ 95.306020] pstate: 60000005 (nZCv daif -PAN -UAO)
>> [ 95.311369] pc : drm_mm_takedown+0x34/0x44
>> [ 95.315940] lr : drm_mm_takedown+0x34/0x44
>> ...
>> [ 95.415857] drm_mm_takedown+0x34/0x44
>> [ 95.420042] rockchip_drm_unbind+0x64/0x8c
>> [ 95.424613] component_master_del+0x88/0xb8
>> [ 95.429283] rockchip_drm_platform_remove+0x2c/0x44
>> [ 95.434728] rockchip_drm_platform_shutdown+0x20/0x2c
>> [ 95.440360] platform_drv_shutdown+0x2c/0x38
>> [ 95.445127] device_shutdown+0x164/0x1b8
>> [ 95.449504] kernel_restart_prepare+0x40/0x48
>> [ 95.454358] kernel_restart+0x20/0x68
>> [ 95.458436] __se_sys_reboot+0x1ac/0x204
>> [ 95.462812] __arm64_sys_reboot+0x2c/0x38
>> [ 95.467287] el0_svc_common+0xa4/0xec
>> [ 95.471373] el0_svc_compat_handler+0x30/0x3c
>> [ 95.476235] el0_svc_compat+0x8/0x18
>> [ 95.480215] ---[ end trace b9ba2e701f4fb234 ]---
>>
>> It's especially bad on -stable kernels, where I believe the remove()
>> paths were even worse. This triggers a variety of OOPSes, and it's not
>> clear if those are simply because of backports (e.g., RK3399 did not
>> have support in 4.4.y, but our downstream has merged all sorts of
>> backports to make it work).
>>
>> Anyway, the above warnings occur on v4.20-rc, which I think is
>> justification enough for a revert.
>
> That's strange. I remember testing quite a number of shutdown/reboot
> cycles before applying that patch. And for good measure did the same
> again right now.
>
> - Kevin, with netboot firmware, booting into Debian+console only
> - Bob, with stock firmware, booting into Debian+KDE Plasma
> - Scarlet, with stock firmware, booting into Debian+KDE Plasma
>
> With some random number of reboot and shutdowns on each I didn't
> see any warnings at all.
And I've been using this very patch for quite a while now.
Before suggesting a revert, I'd rather we understand what is going on,
and why is the DRM layer crapping itself that badly for a legitimate
operation (it is certainly better to have a shutdown than to let the VOP
scan out crap once the IOMMU has been shut down). In short, don't shoot
the messenger.
>
>> I plan to submit a revert which I hope can go to 4.20 as well as
>> -stable. I'd hope the remove()/shutdown() paths should be fixed before
>> this gets applied again, and that it does not get shipped to -stable
>> kernels.
>
> But judging by the fact that the warning indicates that somthing is still
> holding onto a framebuffer and a rmmod rockchipdrm is not possible
> at runrtime for likely the same reason, I guess we really might be creating
> a problem with that shutdown.
That's a potential root cause.
>
> Can you maybe give "drm/rockchip: shutdown drm subsystem on shutdown" [2]
> a try? When the underlying issue of rebooting surfaced we had 2 competing
> solutions, so we at least don't reopen the issue, that people have problems
> rebooting?
kexec working is certainly something I need. And I'd like to understand
why Brian sees this and nobody else.
Thanks,
M.
--
Jazz is not dead. It just smells funny...