Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) onresume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps relatedto preempt_count leakage in keventd)

From: Linus Torvalds
Date: Wed Nov 11 2009 - 14:53:31 EST




On Wed, 11 Nov 2009, Rafael J. Wysocki wrote:
>
> I thought that the problem was somehow related to user space, because it only
> happens after we've thawed tasks. At least, all of the call traces I was able
> to collect indicated so.
>
> Moreover, in a few cases I got
>
> kernel: PM: Finishing wakeup.
> kernel: Restarting tasks ...
> kernel: usb 5-2: USB disconnect, address 2
> kernel: done.
> bluetoothd[3445]: HCI dev 0 unregistered
> bluetoothd[3445]: Unregister path: /org/bluez/3445/hci0
> bluetoothd[3445]: Unregistered interface org.bluez.NetworkPeer on path /org/bluez/3445/hci0
> bluetoothd[3445]: Unregistered interface org.bluez.NetworkHub on path /org/bluez/3445/hci0
> bluetoothd[3445]: Unregistered interface org.bluez.NetworkRouter on path /org/bluez/3445/hci0
> kernel: Slab corruption: size-512 start=ffff88007f1182b0, len=512
>
> and so on (of course, the bluetoothd PID was different each time), so I thought
> that the problem might be related to Bluetooth.

Hmm. Sounds reasonable. It's still that 'size-512', but if the sound
subsystem and the bluetooth code both happen to use that size, that would
explain why there was sound data in the slab.

> So, I've disabled the Bluetooth subsystem in the kernel config and I'm not able
> to reproduce the problem any more (at least not within 50 consecutive
> suspend-resume and hibernate-resume cycles). Thus Bluetooth seems to be
> at least necessary to reproduce the issue and perhaps it's also the cause of
> it.

Which BT driver are you using? Maybe it's specific to the low-level
driver?

For example, I could imagine that (say) a USB bluetooth dongle (I think
they are common for for mice, and are sometimes built-in on the
motherboard) could get the USB "disconnect" event, and get freed while
some work from the resume is still pending.

I'm looking at btusb_disconnect(), for example. It's one of the few BT
drivers that seem to use workqueues, and I'm not seeing a
cancel_work_sync() in the disconnect routine - but maybe the btusb_close()
routine is called indirectly some way that I just don't see.

Marcel?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/