Re: [PATCH V2] usb: gadget: f_fs: don't free buffer prematurely

From: John Stultz
Date: Wed Mar 20 2019 - 19:43:06 EST

On Wed, Mar 20, 2019 at 4:28 PM Yang, Fei <fei.yang@xxxxxxxxx> wrote:
> > Hey Fei,
> > So while this patch does resolve the issues I was seeing with mainline kernels and recent changes to adbd,
> > Josh pointed out that it wouldn't resolve the issues I was seeing with older kernels which is slightly different (but still related to aio usage).
> >
> > On the older kernels I'm hitting scheduling while atomic on reboot, which seems to be due to ffs_aio_cancel() taking a spinlock then calling usb_ep_dequeue() which might sleep.
> >
> > It seems a fix for this was tried earlier with d52e4d0c0c428 ("usb:
> > gadget: ffs: Fix BUG when userland exits with submitted AIO
> > transfers") which was then reverted by a9c859033f6e.
> >
> > Elsewhere it seems the ffs driver takes effort to drop any locks before calling usb_ep_dequeue(), so this seems like
> > that should be addressed, but it also seems like recent change to the dwc3 driver has been made to avoid sleeping
> > in that path (see fec9095bdef4 ("usb: dwc3: gadget: remove wait_end_transfer")), which may be why I'm not seeing
> > the problem with mainline (and your patch here, of coarse). But that also doesn't clarify if its still a potential issue
> > w/ non-dwc3 platforms.
> >
> > So for older kernels, do you have a suggestion of which approach is advised? Does usb_ep_dequeue need to avoid
> > sleeping or do we need to rework the ffs_aio_cancel logic?
> Are you seeing this issue with Android? When running adb reboot?
> I have tried 4.19 and 4.9 kernel with Android P-dessert on one of the Intel platforms, but no luck on reproducing the issue.

You probably need to be running AOSP/master to trigger this. The
changes which uncovered this landed just last week.

> I will get back to you if I could reproduce the issue. I'm afraid I wonât be able to do much by just looking at the code.

So as discussed in further mails, the main issue seems to be the dwc3
code was sleeping in its ep_dequeue logic, which isn't safe as
ffs_aio_cancel calls it while holding a spinlock. Upstream the dw3c
driver has been fixed, but -stable kernels still have the issue.