Re: sun50i-a64-pinctrl WARN_ON drivers/base/dd.c:349

From: Andrà Przywara
Date: Sat Apr 29 2017 - 19:41:01 EST


On 29/04/17 22:28, Adam Borowski wrote:
> On Fri, Apr 28, 2017 at 06:03:14PM -0400, Tejun Heo wrote:
>> On Tue, Apr 18, 2017 at 10:12:16AM +0100, Andre Przywara wrote:
>>> Yeah, so I stack-dumped on the zero allocations and indeed they are
>>> called from cleanup functions:
>>> drivers/pinctrl/pinmux.c:pinmux_generic_free_functions():
>>> devm_kzalloc(sizeof(*indices) * pctldev->num_functions, ...)
>>> (and another one I don't know from the top of the my head, logs at home)
>>>
>>> So my hunch was that once EPROBE_DEFER triggers the devres cleanup, it
>>> uses some reverse list traversal to release all allocated resources
>>> (backwards!), so missing those which get (appended) during the process.
>>> But I don't think that would not work with the locking.
>>> So I have to dig deeper tonight in my logs.
>>
>> If this is a valid use case, we can change devm to repeat till empty
>> but it's a weird thing to do to allocate from a release function.
>>
>> So, something like this. Only compile tested.

I was wondering if using devm_*alloc in a _release_ function is valid at
all, given that it is called as part of the DEFER clean-up routine.
Looking at pinmux_generic_free_functions() it looks like we could
replace it with a (non-devm_) kmalloc version and it would still work
(given we add a kfree at the end).
Either that or we bail out early if pctldev->num_functions is zero.

> I've tested this, and the results are mixed. It _does_ make the WARN go
> away, but on the other hand, much later during the boot, another driver
> (sun8i-emac) gets deferred and never undefers.
>
> Without the patch, there's:
> [ ok ] Synthesizing the initial hotplug events...done.
> [ 6.666984] bus: 'platform': driver_probe_device: matched device 1c30000.ethernet with driver sun8i-emac
> [ 6.676520] bus: 'platform': really_probe: probing driver sun8i-emac with device 1c30000.ethernet
> [ 6.686479] driver: 'sun8i-emac': driver_bound: bound to device '1c30000.ethernet'
> [....] [ 6.694132] bus: 'platform': really_probe: bound device 1c30000.ethernet to driver sun8i-emac
> [ ok ng for /dev to be fully populated...done.
> <snip>
> [....] Configuring network interfaces...[ 11.414884] libphy: 1c30000.ethernet: probed
> [ 11.449742] bus: 'mdio_bus': driver_probe_device: matched device 1c30000.ethernet-0:00 with driver RTL8211E Gigabit Ethernet
> [ 11.461013] bus: 'mdio_bus': really_probe: probing driver RTL8211E Gigabit Ethernet with device 1c30000.ethernet-0:00
> [ 11.471666] driver: 'RTL8211E Gigabit Ethernet': driver_bound: bound to device '1c30000.ethernet-0:00'
> [ 11.481007] bus: 'mdio_bus': really_probe: bound device 1c30000.ethernet-0:00 to driver RTL8211E Gigabit Ethernet
> [ 11.491527] RTL8211E Gigabit Ethernet 1c30000.ethernet-0:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=1c30000.ethernet-0:00, irq=-1)
> [ 11.505608] sun8i-emac 1c30000.ethernet: device MAC address slot 0 02:ba:8f:9e:44:8f
>
> With the patch:
> [ ok ] Synthesizing the initial hotplug events...done.
> [....] Waiting for /dev to be fully populated...[ 6.325314] bus: 'platform': driver_probe_device: matched device 1c30000.ethernet with driver sun8i-emac
> [ 6.334842] bus: 'platform': really_probe: probing driver sun8i-emac with device 1c30000.ethernet
> [ 6.344253] platform 1c30000.ethernet: Driver sun8i-emac requests probe deferral
> [ 6.351662] platform 1c30000.ethernet: Added to deferred list
> done.
> <snip>
> [....] Configuring network interfaces...Cannot find device "eth0"
> ifup: failed to bring up eth0
> failed.
>
>
> There's a concern, though, that this second driver is not in mainline (and
> won't ever be -- it's being replaced by dwmac-sun8i, but the latter is not
> yet up to scratch), thus I'm not sure how relevant this is.
>
> The second concern is that I'm using the same tree
> (icenowy/sunxi64-4.11-rc7) and config (plus filesystems) as Icenowy, on the
> same hardware, yet I did see this warning in versions much earlier than her,
> and it doesn't go away in -next for me either. All that differs are
> toolchain versions (I'm on Debian unstable) and perhaps some u-boot bits.

My understanding is that the probing order is determined during link
time of the kernel - at least for functions using the same initcall
level. For drivers in the same subsystem this is mostly set by the order
in the Makefile, but for drivers from different subsystems their order
might by slightly different depending on parallel compilation. So on
some builds the pinctrl driver gets called late and everything is fine,
whereas on other build it gets called early and returns with -EDEFER,
leading to the issue.

> Because of the amount of churn in sunxi land and the driver's mainlining
> status, I think it'd be good to wait after the merge window that starts
> tomorrow, and re-test then.

Well, I see this issue with pure mainline and defconfig, so no sunxi
specialities here.

Will do more investigation tomorrow.

Cheers,
Andre.

>
> Full console output with the patch applied attached.
>
>
> Meow!
>
>
>> diff --git a/drivers/base/devres.c b/drivers/base/devres.c
>> index 71d5770..d2a9f34 100644
>> --- a/drivers/base/devres.c
>> +++ b/drivers/base/devres.c
>> @@ -509,13 +509,21 @@ static int release_nodes(struct device *dev, struct list_head *first,
>> int devres_release_all(struct device *dev)
>> {
>> unsigned long flags;
>> + int cnt = 0, ret;
>>
>> /* Looks like an uninitialized device structure */
>> if (WARN_ON(dev->devres_head.next == NULL))
>> return -ENODEV;
>> - spin_lock_irqsave(&dev->devres_lock, flags);
>> - return release_nodes(dev, dev->devres_head.next, &dev->devres_head,
>> - flags);
>> +
>> + /* Release callbacks may create new nodes, repeat till empty */
>> + do {
>> + spin_lock_irqsave(&dev->devres_lock, flags);
>> + ret = release_nodes(dev, dev->devres_head.next,
>> + &dev->devres_head, flags);
>> + cnt += ret;
>> + } while (ret);
>> +
>> + return cnt;
>> }
>>
>> /**
>>
>