Re: [PATCH 2/3 v2] fb: udlfb: fix hang at disconnect

From: Alexander Holler
Date: Tue Jan 29 2013 - 05:36:19 EST


Am 29.01.2013 01:56, schrieb Alexander Holler:
Am 29.01.2013 01:22, schrieb Andrew Morton:
On Fri, 25 Jan 2013 19:49:27 +0100
Alexander Holler <holler@xxxxxxxxxxxxx> wrote:

When a device was disconnected the driver may hang at waiting for urbs it never
will get. Fix this by using a timeout while waiting for the used semaphore.

There is still a memory leak if a timeout happens, but at least the driver
now continues his disconnect routine.

...

--- a/drivers/video/udlfb.c
+++ b/drivers/video/udlfb.c
@@ -1832,8 +1832,9 @@ static void dlfb_free_urb_list(struct dlfb_data *dev)
/* keep waiting and freeing, until we've got 'em all */
while (count--) {

- /* Getting interrupted means a leak, but ok at disconnect */
- ret = down_interruptible(&dev->urbs.limit_sem);
+ /* Timeout likely occurs at disconnect (resulting in a leak) */
+ ret = down_timeout_killable(&dev->urbs.limit_sem,
+ FREE_URB_TIMEOUT);
if (ret)
break;

This is rather a hack. Do you have an understanding of the underlying
bug? Why is the driver waiting for things which will never happen?

To add a bit more explanation:

I've experienced that bug after moving the fb-damage-handling into a workqueue (to make the driver usable as console). This likely has increased the possibility that an urb gets missed when the usb-stack calls the (usb-)disconnect function of the driver. But I don't know as I couldn't use the driver before (as fbcon) so I don't really have a comparison.

What currently happens here is something like that:

fb -> damage -> workload which sends urb and waits for answer
device disconnect -> dlfb_usb_disconnect() -> stall (no answer to the above urb)

I don't know why the disconnect waits for all urbs. The code looks like it does that just to free the allocated memory. As I'm not very familiar with the usb-stack, I would have to read up about the urb-handling to find out how to free the memory otherwise.

As the previous comment in the code suggests that urbs already got missed (on shutdown) before, I assume that even without my patch, which moved the damage into a workqueue, the problem could occur which then prevents a shutdown as there is no timeout. As I've experienced that problem not only on disconnect, but on shutdown too (no shutdown was possible), I have to assume, that the previous used down_interruptible() didn't get a signal on shutdown (if the driver is used as fbcon), therefor I consider the timeout as necessary.

Regards,

Alexander

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/