Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1

From: Thinh Nguyen
Date: Fri Feb 01 2019 - 19:31:55 EST

Hi John,

John Stultz wrote:
> Hey all,
> Since the 5.0 merge window opened, I've been tripping on frequent
> dwc3 crashes on reboot and suspend, which I've added an example to the
> bottom of this mail.
> I've dug in a little bit and sort of have a sense of whats going on.
> In ffs_epfile_io():
> The completion done is setup on the stack:
> Then later we setup a request and queue it:
> req->context = &done;
> ...
> ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
> Then wait for it:
> if (unlikely(wait_for_completion_interruptible(&done))) {
> /*
> * To avoid race condition with ffs_epfile_io_complete,
> * dequeue the request first then check
> * status. usb_ep_dequeue API should guarantee no race
> * condition with req->complete callback.
> */
> usb_ep_dequeue(ep->ep, req);
> interrupted = ep->status < 0;
> }
> The problem is, that we end up being interrupted, supposedly dequeue
> the request, and exit.
> But then (or in parallel) the irq triggers and we try calling
> complete() on the context pointer which points to now random stack
> space, which results in the panic.
> It seems like something is wrong with usb_ep_dequeue not really
> stopping the irq from happening?
> If I revert all the changes to dwc3 back to 4.20, I don't see the issue.
> I'll do some bisection to try to narrow things down, but I wanted to
> see if this was a known issue or if anyone had immediate ideas as to
> what might be wrong.

I'm not sure if this is related, but can you try to test using Felipe's
testing/next branch? There is a fix to a race condition when the gadget
driver tries to dequeue requests.

See if you run into this issue again.