Re: debugging oops after disconnecting Nexio USB touchscreen

From: Alan Stern
Date: Mon Nov 30 2009 - 15:19:52 EST


On Mon, 30 Nov 2009, Ondrej Zary wrote:

> It does not make much sense to me but I think that it crashes iside this list
> manipulation:
>
> prev = ehci->async;
> while (prev->qh_next.qh != qh)
> prev = prev->qh_next.qh;

Yes, it's crashing in the "while" test because prev is NULL. This
means the code is looking for qh in the async list but not finding it.
That's supposed to be impossible.

The assembly code is peculiar because it includes stuff that isn't in
the source code! For example, right at this point (after the end of
the loop) there's a test to see whether prev is NULL. Where could that
have come from? Do you have any idea?

> prev->hw_next = qh->hw_next;
> prev->qh_next = qh->qh_next;
> wmb ();

These lines aren't reached.

Does this happen every time you disconnect the Nexio?

You can try patching that loop. If prev is NULL then print an error
message in the log, including the value of qh and the value of
ehci->async, and jump past the following three statements.

With that change the system shouldn't crash, although khubd might hang.
But we still need to find out how this could have happened. Try
collecting a usbmon trace while running the test; then let's compare
the usbmon output with the error messages in the log.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/