Re: [PATCH RFC] regmap: maple: Switch to use irq-safe locking

From: Andy Yan
Date: Tue Aug 20 2024 - 11:47:50 EST


Hi Mark and Cristian,
also cc Sasha,

On 8/15/24 03:04, Mark Brown wrote:
On Wed, Aug 14, 2024 at 01:20:21AM +0300, Cristian Ciocaltea wrote:

Commit 3d59c22bbb8d ("drm/rockchip: vop2: Convert to use maple tree
register cache") enabled the use of maple tree register cache in
Rockchip VOP2 driver. However, building the kernel with lockdep support
indicates locking rules violation when trying to unload the rockchipdrm
module:

[ 48.360258] ========================================================
[ 48.360829] WARNING: possible irq lock inversion dependency detected
[ 48.361400] 6.11.0-rc1 #40 Not tainted
[ 48.361743] --------------------------------------------------------
[ 48.362311] modprobe/685 just changed the state of lock:
[ 48.362790] ffff0000087fa798 (&mt->ma_lock){+...}-{2:2}, at: regcache_maple_exit+0x6c/0xe0

Please think hard before including complete backtraces in upstream
reports, they are very large and contain almost no useful information
relative to their size so often obscure the relevant content in your
message. If part of the backtrace is usefully illustrative (it often is
for search engines if nothing else) then it's usually better to pull out
the relevant sections.

The problem is that the regmap lock could be taken by an IRQ context,
interrupting the irq-unsafe maple tree lock, which may result in a lock
inversion deadlock scenario.

Switch to use irq-safe locking in the maple tree register cache.

I'd have a bigger question here which is why the driver is using a
dynamically allocated register cache in a hardirq context, especially
with no defaults provided? Anything except the flat cache might do

Do you mean the current code we call devm_regmap_init_mmio in vop2_bind function ?

allocations at runtime which might include in interrupt context unless
the caller is very careful and since the lockdep warning triggered it's
clear that this driver isn't. The core will be doing atomic allocations
for MMIO but that's not something we want to be doing as a matter of
course... I would generally expect drivers to try to ensure that any
registers are cached outside of the interrupt handler, usually by
specifying defaults or touching all registers during setup.

Without having done a full analysis it also looks like the marking of
volatile registers isn't right, it's not immediately clear that the
interrupt status and clear registers are volatile and they ought to be.
None of the registers accessed in interrupt context look like they
should be cached at all unless there's something triggered via the DRM
vblank calls.


I think the interrupt status and clear registers should also be marked as volatile.
But this is not releated to the current issue, right?
It might be safer to fall back to the rbtree cache for this device since
rbtree doesn't force an extra level of locking on us, though like I say

I also think fall back to rbtree would work. I had a similar thought the first
time I encountered this issue[0]. But i try to keep maple tree as is based
on a much more modern data structure.

I'm not convinced that what the driver is doing with caching is a super
good idea. Though probably what the driver is doing should work.
The registers of VOP are quite special: Each write operation to the register
does not take effect immediately, it only take effect after the next VBLANK if
we write the CFGONE register.
So we need a cache to record what we wrote to register before.




[0]https://patchwork.kernel.org/project/linux-rockchip/patch/20231217084415.2373043-1-andyshrk@xxxxxxx/