Re: pci: kernel crash in bus_find_device

From: Guenter Roeck
Date: Wed May 21 2014 - 19:01:50 EST


On Wed, May 21, 2014 at 01:04:04PM -0700, Francesco Ruggeri wrote:
> I have been using an x86 platform.
> When I started working on it I got early crashes until I added the
> check for p not NULL in
>
> +void bus_release_device(struct device *dev)
> +{
> + struct device_private *p = dev->p;
> +
> + if (p && klist_node_attached(&p->knode_bus))
> + klist_put_last(&p->knode_bus);
> +}
> +
>
> Maybe on powerpc *p is overriden between device_del and device_release?
>
> Or maybe some of the BUG_ONs in the patch? The ones on knode_dead are
> treated as WARN_ONs in the current klist code.
> The one in BUG_ON(!klist_dec_and_del(n)); is new, and in my tests I
> ran into it without the second patch (but only when I ran my module
> and tests).
>
Hi Francesco,

I replaced the BUG_ON with WARN_ON; still crashes.

Anyway, the problem seems to be known. I found two related exchanges.

[1] describes pretty much the same problem. I don't see if/where it was
ever fixed, though.

[2] is a patch to fix the problem. It did not apply cleanly to 3.14,
so I had to make some adjustments in klist_iter_init_node. Resulting
patch is below. With this patch, the problem is gone. It is not perfect,
as it aborts the loop if it encounters a deleted kobject, but it is better
than nothing. Unfortunately, the patch never made it upstream; no idea why.
Copying the author and Greg to get additional feedback.

Guenter

[1] https://lkml.org/lkml/2008/10/26/79
[2] https://lkml.org/lkml/2012/4/16/218

----