Re:Re: [PATCH RFC] regmap: maple: Switch to use irq-safe locking

From: Andy Yan
Date: Mon Aug 19 2024 - 06:20:49 EST



Hi Cristian,

At 2024-08-17 04:11:27, "Cristian Ciocaltea" <cristian.ciocaltea@xxxxxxxxxxxxx> wrote:
>On 8/14/24 10:04 PM, Mark Brown wrote:
>> On Wed, Aug 14, 2024 at 01:20:21AM +0300, Cristian Ciocaltea wrote:
>
>[...]
>
>> I'd have a bigger question here which is why the driver is using a
>> dynamically allocated register cache in a hardirq context, especially
>> with no defaults provided? Anything except the flat cache might do
>> allocations at runtime which might include in interrupt context unless
>> the caller is very careful and since the lockdep warning triggered it's
>> clear that this driver isn't. The core will be doing atomic allocations
>> for MMIO but that's not something we want to be doing as a matter of
>> course... I would generally expect drivers to try to ensure that any
>> registers are cached outside of the interrupt handler, usually by
>> specifying defaults or touching all registers during setup.
>>
>> Without having done a full analysis it also looks like the marking of
>> volatile registers isn't right, it's not immediately clear that the
>> interrupt status and clear registers are volatile and they ought to be.
>> None of the registers accessed in interrupt context look like they
>> should be cached at all unless there's something triggered via the DRM
>> vblank calls.
>
>AFAIKT, all registers accessed in IRQ context are volatile, hence the
>register cache should not be involved at that point.
>
>The deadlock scenario indicated by lockdep actually points to the lock
>acquired by regcache_maple_exit(), which has been triggered during module
>unload operation, and the lock acquired by regcache_maple_write(), in the
>context of vop2_plane_atomic_update() called within the DRM stack.
>
>[ 48.466666] -> (&mt->ma_lock){+...}-{2:2} {
>[ 48.467066] HARDIRQ-ON-W at:
>[ 48.467360] lock_acquire+0x1d4/0x320
>[ 48.467849] _raw_spin_lock+0x50/0x70
>[ 48.468337] regcache_maple_exit+0x6c/0xe0
>[ 48.468864] regcache_exit+0x8c/0xa8
>[ 48.469344] regmap_exit+0x24/0x160
>[ 48.469815] devm_regmap_release+0x1c/0x28
>[ 48.470339] release_nodes+0x68/0xa8
>[ 48.470818] devres_release_group+0x120/0x180
>[ 48.471364] component_unbind+0x54/0x70
>[ 48.471867] component_unbind_all+0xb0/0xe8
>[ 48.472400] rockchip_drm_unbind+0x44/0x80 [rockchipdrm]
>[ 48.473059] component_del+0xc8/0x158
>[ 48.473545] dw_hdmi_rockchip_remove+0x28/0x40 [rockchipdrm]
>
>[...]
>
>[ 48.482058] INITIAL USE at:
>[ 48.482344] lock_acquire+0x1d4/0x320
>[ 48.482824] _raw_spin_lock+0x50/0x70
>[ 48.483304] regcache_maple_write+0x27c/0x330
>[ 48.483844] regcache_write+0x6c/0x88
>[ 48.484323] _regmap_read+0x198/0x1c8
>[ 48.484801] _regmap_update_bits+0xc0/0x148
>[ 48.485327] regmap_field_update_bits_base+0x74/0xb0
>[ 48.485919] vop2_plane_atomic_update+0x9e8/0x1490 [rockchipdrm]
>[ 48.486631] drm_atomic_helper_commit_planes+0x190/0x2f8 [drm_kms_helper]
>
>I experimented with a reduced scope of this patch by limiting the use of
>the irq-safe lock to regcache_maple_exit() only, and I can confirm this
>was enough to make lockdep happy.
>
>> It might be safer to fall back to the rbtree cache for this device since
>> rbtree doesn't force an extra level of locking on us, though like I say
>> I'm not convinced that what the driver is doing with caching is a super
>> good idea. Though probably what the driver is doing should work.
>
>I actually gave the flat cache a try on a Rock 3A board and didn't
>encounter any (obvious) issues, but my testing capabilities are rather
>limited at the moment.
>
>@Andy: Could you, please, shed some light on the topic? i.e. the rational
>behind going for an rbtree cache over a flat one, since the latter would be
>better suited for MMIO devices.

I have encountered a similar issue when I add support for rk3588[0]

Now i can see this issue when rockchipdrm load with:
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_LOCKDEP=y

But I can't reproduce this issue at unload (with cmd: rmmod rockchipdrm)。
I need to take a deeper look to understanding the detail。


[0]https://patchwork.kernel.org/project/linux-rockchip/patch/20231217084415.2373043-1-andyshrk@xxxxxxx/



>
>> My first thought here is that if we've got a regmap using spinlocks for
>> the regmap lock and a maple tree cache we should arrange things so that
>> the maple tree lock is used for the regmap's lock. That would however
>> involve some unpleasant abstraction violation, and possibly some macro
>> fun since we'd need to elide the locking from the cache itself when
>> using the same lock at the regmap level. I think that's going to be a
>> case of choosing the least unpleasant option.
>
>Thanks, Mark, for the detailed feedback on this!
>
>Regards,
>Cristian
>
>_______________________________________________
>Linux-rockchip mailing list
>Linux-rockchip@xxxxxxxxxxxxxxxxxxx
>http://lists.infradead.org/mailman/listinfo/linux-rockchip