Re: of_node_put() usage is buggy all over drivers/of/base.c?!

From: Frank Rowand
Date: Mon Aug 16 2021 - 16:00:56 EST



Hit send too soon, a couple of cleanups:

On 8/16/21 2:56 PM, Frank Rowand wrote:
> On 8/16/21 2:20 PM, Rob Herring wrote:
>> On Mon, Aug 16, 2021 at 10:14 AM Frank Rowand <frowand.list@xxxxxxxxx> wrote:
>>>
>>> On 8/16/21 9:46 AM, Vladimir Oltean wrote:
>>>> Hi Frank,
>>>>
>>>> On Mon, Aug 16, 2021 at 09:33:03AM -0500, Frank Rowand wrote:
>>>>> Hi Vladimir,
>>>>>
>>>>> On 8/13/21 8:01 PM, Vladimir Oltean wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I was debugging an RCU stall which happened during the probing of a
>>>>>> driver. Activating lock debugging, I see:
>>>>>
>>>>> I took a quick look at sja1105_mdiobus_register() in v5.14-rc1 and v5.14-rc6.
>>>>>
>>>>> Looking at the following stack trace, I did not see any calls to
>>>>> of_find_compatible_node() in sja1105_mdiobus_register(). I am
>>>>> guessing that maybe there is an inlined function that calls
>>>>> of_find_compatible_node(). This would likely be either
>>>>> sja1105_mdiobus_base_tx_register() or sja1105_mdioux_base_t1_register().
>>>>
>>>> Yes, it is sja1105_mdiobus_base_t1_register which is inlined.
>>>>
>>>>>>
>>>>>> [ 101.710694] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938
>>>>>> [ 101.719119] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1534, name: sh
>>>>>> [ 101.726763] INFO: lockdep is turned off.
>>>>>> [ 101.730674] irq event stamp: 0
>>>>>> [ 101.733716] hardirqs last enabled at (0): [<0000000000000000>] 0x0
>>>>>> [ 101.739973] hardirqs last disabled at (0): [<ffffd3ebecb10120>] copy_process+0xa78/0x1a98
>>>>>> [ 101.748146] softirqs last enabled at (0): [<ffffd3ebecb10120>] copy_process+0xa78/0x1a98
>>>>>> [ 101.756313] softirqs last disabled at (0): [<0000000000000000>] 0x0
>>>>>> [ 101.762569] CPU: 4 PID: 1534 Comm: sh Not tainted 5.14.0-rc5+ #272
>>>>>> [ 101.774558] Call trace:
>>>>>> [ 101.794734] __might_sleep+0x50/0x88
>>>>>> [ 101.798297] __mutex_lock+0x60/0x938
>>>>>> [ 101.801863] mutex_lock_nested+0x38/0x50
>>>>>> [ 101.805775] kernfs_remove+0x2c/0x50 <---- this takes mutex_lock(&kernfs_mutex);
>>>>>> [ 101.809341] sysfs_remove_dir+0x54/0x70
>>>>>
>>>>> The __kobject_del() occurs only if the refcount on the node
>>>>> becomes zero. This should never be true when of_find_compatible_node()
>>>>> calls of_node_put() unless a "from" node is passed to of_find_compatible_node().
>>>>
>>>> I figured that was the assumption, that the of_node_put would never
>>>> trigger a sysfs file / kobject deletion from there.
>>>>
>>>>> In both sja1105_mdiobus_base_tx_register() and sja1105_mdioux_base_t1_register()
>>>>> a from node ("mdio") is passed to of_find_compatible_node() without first doing an
>>>>> of_node_get(mdio). If you add the of_node_get() calls the problem should be fixed.
>>>>
>>>> The answer seems simple enough, but stupid question, but why does
>>>> of_find_compatible_node call of_node_put on "from" in the first place?
>>>
>>> Actually a good question.
>>>
>>> I do not know why of_find_compatible_node() calls of_node_put() instead of making
>>> the caller of of_find_compatible_node() responsible. That pattern was created
>>> long before I was involved in devicetree and I have not gone back to read the
>>> review comments of when that code was created.
>>
>
>> Because it is an iterator function and they all drop the ref from the
>> prior iteration.
>
> That is what I was expecting before reading through the code. But instead
> I found of_find_compatible_node():
>
> raw_spin_lock_irqsave(&devtree_lock, flags);
> for_each_of_allnodes_from(from, np)
> if (__of_device_is_compatible(np, compatible, type, NULL) &&
> of_node_get(np))
> break;
> of_node_put(from);
> raw_spin_unlock_irqrestore(&devtree_lock, flags);
>
>
> for_each_of_allnodes_fromir:

for_each_of_allnodes_from():

>
> #define for_each_of_allnodes_from(from, dn) \
> for (dn = __of_find_all_nodes(from); dn; dn = __of_find_all_nodes(dn))
>
>
> and __of_find_all_nodes() is:
>
> struct device_node *__of_find_all_nodes(struct device_node *prev)
> {
> struct device_node *np;
> if (!prev) {
> np = of_root;
> } else if (prev->child) {
> np = prev->child;
> } else {
> /* Walk back up looking for a sibling, or the end of the structure */
> np = prev;
> while (np->parent && !np->sibling)
> np = np->parent;
> np = np->sibling; /* Might be null at the end of the tree */
> }
> return np;
> }
>
>
> So the iterator is not using of_node_get() and of_node_put() for each
> node that is traversed. The protection against a node disappearing
> during the iteration is provided by holding devtree_lock.
>
>>
>> I would say any open coded call where from is not NULL is an error.
>
> I assume you mean any open coded call of of_find_compatible_node(). There are
> at least a couple of instances of that. I did only a partial grep while looking
> at Vladimir's issue.
>
> Doing the full grep now, I see 13 instances of architecture and driver code calling
> of_find_compatible_node().

of_find_compatible_node() with parameter "from" not NULL.

>
>> It's not reliable because the DT search order is not defined and could
>> change. Someone want to write a coccinelle script to check that?
>>
>
>> The above code should be using of_get_compatible_child() instead.
>
> Yes, of_get_compatible_child() should be used here. Thanks for pointing
> that out.
>
> There are 13 instances of architecture and driver code calling
> of_find_compatible_node(). If possible, it would be good to change all of
> them to of_get_compatible_child(). If we could replace all driver
> usage of of_find_compatible_node() with a from parameter of NULL to
> a new wrapper without a from parameter, where the wrapper calls
> of_find_compatible_node() with the from parameter set to NULL, then
> we could prevent this problem from recurring.
>
> (I did not look at all 13 instances yet, to see if this can be done.)
>
>>
>> Rob
>>
>