Re: pci: kernel crash in bus_find_device
From: Francesco Ruggeri
Date: Thu May 22 2014 - 12:19:46 EST
Aborting a search does not sound like a correct solution.
How does a higher level user (eg for_each_pci_dev) know that a search
was aborted and decide whether it should try again, assuming it would
be ok repeating the action on the devices visited the first time?
Francesco
On Thu, May 22, 2014 at 12:22 AM, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> On 05/22/2014 12:14 AM, Greg Kroah-Hartmann wrote:
>>
>> On Wed, May 21, 2014 at 03:59:58PM -0700, Guenter Roeck wrote:
>>>
>>> On Wed, May 21, 2014 at 01:04:04PM -0700, Francesco Ruggeri wrote:
>>>>
>>>> I have been using an x86 platform.
>>>> When I started working on it I got early crashes until I added the
>>>> check for p not NULL in
>>>>
>>>> +void bus_release_device(struct device *dev)
>>>> +{
>>>> + struct device_private *p = dev->p;
>>>> +
>>>> + if (p && klist_node_attached(&p->knode_bus))
>>>> + klist_put_last(&p->knode_bus);
>>>> +}
>>>> +
>>>>
>>>> Maybe on powerpc *p is overriden between device_del and device_release?
>>>>
>>>> Or maybe some of the BUG_ONs in the patch? The ones on knode_dead are
>>>> treated as WARN_ONs in the current klist code.
>>>> The one in BUG_ON(!klist_dec_and_del(n)); is new, and in my tests I
>>>> ran into it without the second patch (but only when I ran my module
>>>> and tests).
>>>>
>>> Hi Francesco,
>>>
>>> I replaced the BUG_ON with WARN_ON; still crashes.
>>>
>>> Anyway, the problem seems to be known. I found two related exchanges.
>>>
>>> [1] describes pretty much the same problem. I don't see if/where it was
>>> ever fixed, though.
>>>
>>> [2] is a patch to fix the problem. It did not apply cleanly to 3.14,
>>> so I had to make some adjustments in klist_iter_init_node. Resulting
>>> patch is below. With this patch, the problem is gone. It is not perfect,
>>> as it aborts the loop if it encounters a deleted kobject, but it is
>>> better
>>> than nothing. Unfortunately, the patch never made it upstream; no idea
>>> why.
>>> Copying the author and Greg to get additional feedback.
>>>
>>> Guenter
>>>
>>> [1] https://lkml.org/lkml/2008/10/26/79
>>> [2] https://lkml.org/lkml/2012/4/16/218
>>
>>
>> 2 years ago? I have no idea what was up with that, sorry...
>>
>
> Ok, but do you have comments on the patch itself in its current version ?
>
> Guenter
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/