Re: [PATCH] usb: gadget: f_uac2: fixup feedback endpoint stop
From: Jerome Brunet
Date: Thu Aug 26 2021 - 03:50:03 EST
On Wed 25 Aug 2021 at 21:42, Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> wrote:
> Ferry Toth wrote:
>> Hi,
>>
>> Op 25-08-2021 om 11:21 schreef Jerome Brunet:
>>> When the uac2 function is stopped, there seems to be an issue with some
>>> platforms (Intel Merrifield at least)
>>>
>
> The issue isn't hardware specific.
While the actual bug isnt, the report was (given the issue did not show
up during initial testing but did show on Ferry's HW)
Merely citing the bug report from Ferry here
>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>>> ...
>>> RIP: 0010:dwc3_gadget_del_and_unmap_request+0x19/0xe0
>>> ...
>>> Call Trace:
>>> dwc3_remove_requests.constprop.0+0x12f/0x170
>>> __dwc3_gadget_ep_disable+0x7a/0x160
>>> dwc3_gadget_ep_disable+0x3d/0xd0
>>> usb_ep_disable+0x1c/0x70
>>> u_audio_stop_capture+0x79/0x120 [u_audio]
>>> afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>> composite_setup+0x224/0x1b90 [libcomposite]
>>>
>>> The issue happens only when the gadget is using the sync type "async",
>>> not
>>> "adaptive". This indicates that problem is likely coming from the
>>> feedback
>>> endpoint, which is only used with async synchronization mode.
>
> This does not describe the actual problem. The problem is that the
> usb_ep_dequeue() can be an asynchronous call, and we can't free the
> request until its completion (from cancellation).
Indeed. I was not sure at the time.
>
>>>
>>> Update the feedback endpoint free function to release the endpoint the
>>> same
>>> way it is done for the data endpoint.
>>>
>>> Signed-off-by: Jerome Brunet <jbrunet@xxxxxxxxxxxx>
>>> ---
>>>
>>> Hi Ferry,
>>>
>>> Would you mind trying this before reverting the whole thing ?
>>> The HW I have did not show the issue so far so I can't really check
>>> if it helps. Hopefully, it does ...
>>
>> Tested this evening and confirming that this resolves my issue. I can't
>> say much about the code itself, maybe Thinh?
>
> Sure. I can take a look.
>
>>
>> Would be great if we could get this in instead of reverting the series.
>>
>> Tested-by: Ferry Toth <ftoth@xxxxxxxxxxxxxx> (dwc3 / Intel Merrifield)
>>
>>> drivers/usb/gadget/function/u_audio.c | 15 +++++++++++----
>>> 1 file changed, 11 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/usb/gadget/function/u_audio.c
>>> b/drivers/usb/gadget/function/u_audio.c
>>> index 018dd0978995..63d9340f008e 100644
>>> --- a/drivers/usb/gadget/function/u_audio.c
>>> +++ b/drivers/usb/gadget/function/u_audio.c
>>> @@ -230,7 +230,13 @@ static void u_audio_iso_fback_complete(struct
>>> usb_ep *ep,
>>> int status = req->status;
>>> /* i/f shutting down */
>>> - if (!prm->fb_ep_enabled || req->status == -ESHUTDOWN)
>>> + if (!prm->fb_ep_enabled) {
>
> prm->fb_ep_enabled is not protected. Potential race problem here?
Given how the variable is used, I don't think so.
Could you please detail ?
(I don't think this is really related to the current problem though)
>
>>> + kfree(req->buf);
>>> + usb_ep_free_request(ep, req);
>>> + return;
>>> + }
>>> +
>>> + if (req->status == -ESHUTDOWN)
>>> return;
>>> /*
>>> @@ -421,9 +427,10 @@ static inline void free_ep_fback(struct
>>> uac_rtd_params *prm, struct usb_ep *ep)
>>> prm->fb_ep_enabled = false;
>>> if (prm->req_fback) {
>>> - usb_ep_dequeue(ep, prm->req_fback);
>>> - kfree(prm->req_fback->buf);
>>> - usb_ep_free_request(ep, prm->req_fback);
>>> + if (usb_ep_dequeue(ep, prm->req_fback)) {
>>> + kfree(prm->req_fback->buf);
>>> + usb_ep_free_request(ep, prm->req_fback);
>>> + }
>>> prm->req_fback = NULL;
>>> }
>>>
>
> On a separate note, I notice that f_uac2 only queues a single feedback
> request at a time for isoc endpoint? Even though the interval is 1ms,
> this will easily cause data drop.
>
> Also, you're ignoring other request error status and still processing
> bogus data on request completion? That doesn't seem right.
Gadget is sendind the feedback data, not processing it. Every data sent
is OK. Yes, packets can be missed with the current implementation,
meaning the feedback value is not reported as often as initially
intended. On slower HW, packets are also missed with 2 requests queued,
not only on the feedback endpoint but also on the playback
endpoint. Picking the approriate value is not straight forward. For the
feedback endpoint it isn't big deal because, according to the spec, if
the feedback is not sent, the host shall assume the value hasn't
changed. This why the whole thing works as it is.
I admit things still aren't perfect, but there is progress ...
>
> BR,
> Thinh