Re: Kernel Oops while disconnecting USB peripheral (always)

From: Daniel Mack
Date: Sat Jul 28 2012 - 09:30:18 EST


On 28.07.2012 15:25, BjÃrn Mork wrote:
> Daniel Mack <zonque@xxxxxxxxx> writes:
>> On 28.07.2012 14:27, BjÃrn Mork wrote:
>>
>>> The reason is this change:
>>>
>>> 0998d0631 device-core: Ensure drvdata = NULL when no driver is bound
>>>
>>>
>>> It will make bugs like this suddenly 100% reproducible. But the bugs
>>> *are* in the drivers, and may have been there for a long time. The
>>> drivers have been accessing drvdata after unbinding. They just didn't
>>> crash prior to that commit.
>
> I just realized that I might have been concluding too quickly here, as
> usual..
>
> The crashes referred to in this thread were not NULL pointer
> dereferences, which makes it less likely that this change is the
> cause. Could of course still be related somehow, but not directly.
>
>
>>> But the commit is correct, and a very much needed improvement if my
>>> assumptions are correct. The drivers need fixing and this just makes it
>>> evident.
>>
>> Hmm, interesting. Thanks for sharing this. I personally never saw this
>> bug kicking in, but if I understand your findings correctly, we would
>> need something like the following patch for snd-usb and the storage driver?
>>
>> Sarbojit, could you give this a test and see whether your kernel still
>> crashes in any of the two drivers?
>>
>>
>> Thanks,
>> Daniel
>>
>>
>>
>> diff --git a/sound/usb/card.c b/sound/usb/card.c
>> index d5b5c33..0e8caaa 100644
>> --- a/sound/usb/card.c
>> +++ b/sound/usb/card.c
>> @@ -555,7 +555,7 @@ static void snd_usb_audio_disconnect(struct
>> usb_device *dev,
>> struct snd_card *card;
>> struct list_head *p;
>>
>> - if (chip == (void *)-1L)
>> + if (chip == (void *)-1L || chip == NULL)
>> return;
>
> I may be wrong, but I don't think you need this is disconnect. The
> driver will not be unbound until after disconnect returns.

I thought so too, yes. Still, as I don't fully understand the call trace
that is involved across all the driver layers, I thought it might we
worth a try if that fixes it.

> But IMHO, the usage of (void *)-1L as invalid drvdata marker in that
> driver should be replaced with NULL. suspend/resume may also be unsafe
> for example.

Could be, but Sarbojit reported crashes on disconnect, not on suspend.

> I don't really think you need those changes for the same reasons I gave
> above.
>
> Sorry if my comment just confused the search for this bug. bisecting it
> is probably the easiest way to locate it after all.

Yes, definitely.


Thanks, anyway,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/