Re: Hung tasks due to a AB-BA deadlock between the leds_list_lock rwsem and the rtnl mutex

From: Hans de Goede
Date: Thu Jun 06 2024 - 08:02:11 EST


Hi all,

On 5/31/24 2:54 PM, Andrew Lunn wrote:
>> I actually have been looking at a ledtrig-netdev lockdep warning yesterday
>> which I believe is the same thing. I'll include the lockdep trace below.
>>
>> According to lockdep there indeed is a ABBA (ish) cyclic deadlock with
>> the rtnl mutex vs led-triggers related locks. I believe that this problem
>> may be a pre-existing problem but this now actually gets hit in kernels >=
>> 6.9 because of commit 66601a29bb23 ("leds: class: If no default trigger is
>> given, make hw_control trigger the default trigger"). Before that commit
>> the "netdev" trigger would not be bound / set as phy LEDs trigger by default.
>>
>> +Cc Heiner Kallweit who authored that commit.
>>
>> The netdev trigger typically is not needed because the PHY LEDs are typically
>> under hw-control and the netdev trigger even tries to leave things that way
>> so setting it as the active trigger for the LED class device is basically
>> a no-op. I guess the goal of that commit is correctly have the triggers
>> file content reflect that the LED is controlled by a netdev and to allow
>> changing the hw-control mode without the user first needing to set netdev
>> as trigger before being able to change the mode.
>
> It was not the intention that this triggers is loaded for all
> systems.

<snip>

> Reverting this patch does seem like a good way forward, but i would
> also like to give Heiner a little bit of time to see if he has a quick
> real fix.

So it has been almost a week and no reply from Heiner. Since this is
causing real issues for users out there I think a revert of 66601a29bb23
should be submitted to Linus and then backported to the stable kernels.
to fix the immediate issue at hand.

Once the underlying locking issue which is the real root cause here
is fixed then we can reconsider re-applying 66601a29bb23.

Regards,

Hans