Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing

From: H. Nikolaus Schaller
Date: Tue Jun 19 2018 - 00:51:46 EST


Hi Tony,

> Am 19.06.2018 um 06:34 schrieb Tony Lindgren <tony@xxxxxxxxxxx>:
>
> * H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> [180618 18:33]:
>>>> So code just needs group cleanup on failed probing and fixing the mutex around pinctrl_generic_add_group().
>>>>
>>>> I think we need the mutex because a race still can happen when create_pinctrl() is calling pcs_dt_node_to_map()
>>>> and pinctrl_generic_add_group() w/o being locked on pinctrl_maps_mutex.
>>>>
>>>> The race I suspect is that two drivers are trying to insert the same name and may come
>>>> both to the conclusion that it does not yet exist. And both insert into the radix tree.
>>>>
>>>> The window of risk is small though... It is in pinctrl_generic_add_group() between calling
>>>> pinctrl_generic_group_name_to_selector() and radix_tree_insert() so we probably won't
>>>> see it in real hardware tests.
>>>
>>> Hmm but that race should be already fixed with mutex held
>>> by the pin controller drivers with these fixes? Or am I
>>> missing something still?
>>
>> Hm. Maybe we refer to a different mutex?
>
> Yes I think that's the case, you're talking about a different
> mutex here :)
>
>> I had seen the call sequence
>>
>> create_pinctrl()-> pinctrl_dt_to_map() -> pcs_dt_node_to_map() -> pinctrl_generic_add_group()
>>
>> w/o any lock inside.
>>
>> There is a mutex_lock(&pinctrl_maps_mutex); in create_pinctrl(), but locked after that.
>>
>> Or is there a lock outside of create_pinctrl()?
>>
>> If I look into the stack dumps, call nesting is
>>
>> driver_probe_device() -> pinctrl_bind_pins() -> devm_pinctrl_get() -> create_pinctrl()
>>
>> They all do no locking.
>>
>> Maybe I am missing something.
>
> Can you please post a patch for that as you already have it
> debugged? That's easier to understand than reading a verbal
> patch :)

I have no patch for it and all tests were without, but I can suggest a change which IMHO
could solve it:

diff --git a/drivers/pinctrl/core.c b/drivers/pinctrl/core.c
index eb2b217f5e1e..7d125f9a7804 100644
--- a/drivers/pinctrl/core.c
+++ b/drivers/pinctrl/core.c
@@ -1037,15 +1037,16 @@ static struct pinctrl *create_pinctrl(struct device *dev,
INIT_LIST_HEAD(&p->states);
INIT_LIST_HEAD(&p->dt_maps);

+ mutex_lock(&pinctrl_maps_mutex);
ret = pinctrl_dt_to_map(p, pctldev);
if (ret < 0) {
kfree(p);
+ mutex_unlock(&pinctrl_maps_mutex);
return ERR_PTR(ret);
}

devname = dev_name(dev);

- mutex_lock(&pinctrl_maps_mutex);
/* Iterate over the pin control maps to locate the right ones */
for_each_maps(maps_node, i, map) {
/* Map must be for this device */

Description: we should also protect pinctrl_dt_to_map(), which calls pinctrl_generic_add_group()
and the calls inside pinctrl_generic_add_group() to pinctrl_generic_group_name_to_selector()
and radix_tree_insert() with a mutex.

But I haven't tested this case (because it is unlikely to happen and be testable).

BR,
Nikolaus