Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler

From: Baolin Wang
Date: Thu Jan 05 2017 - 07:11:16 EST


Hi,

On 5 January 2017 at 19:19, Felipe Balbi <balbi@xxxxxxxxxx> wrote:
>
> Hi,
>
> Baolin Wang <baolin.wang@xxxxxxxxxx> writes:
>
> [...]
>
>>>>> and you have triggered this with mailine? How? We don't write to GEVNT*
>>>>> registers from PM code and we only allow runtime_suspend with cable
>>>>> dettached.
>>>>
>>>> Sorry for late reply since I am busy on other things. I just agreed
>>>> with the possible races pointed by Janusz. I need to look at if these
>>>> are happened on my platform and also I found some out of tree code
>>>> which will clean the GEVNTCOUNT register when stop the gadget. I will
>>>> check the mainline kernel and resend new patch if I make this problem
>>>> clearly. Anyway thanks for your help and suggestion.
>>>
>>> IOW, you sent me a patch to be integrated in the tree which everybody in
>>> the whole world uses and you didn't even test anything in that very
>>> tree? How am I supposed to trust you're sending me tested patches from
>>> now on?
>>>
>>> Clearly you have no empathy for those working countless hours to keep
>>> this stable and working. If you're ready to send me a completely
>>> untested patch and claim that it's fixing a race condition you have
>>> never seen for yourself, it becomes difficult to trust any patches
>>> you're sending me.
>>
>> I am sure I send you every patch was tested on my vendor platform and
>> I saw the problem on my platform. But like my said I missed something
>> that we have masked all interrupts in the dwc3 interrupt handler, so
>> the real reason maybe caused by some out of tree code on my vendor
>> platform which will clean the GEVNTCOUNT register when stop the
>> gadget. Moreover I did not only do this one thing, and some other
>
> and this is the very problem I'm referring to. If you have changes on
> DWC3 on your "vendor tree" you're testing *mainline* DWC3. Which kernel
> is your tree even based on?

Kernel version is 4.4.

>
>> problem I also need time to test and investigate. So I think I need
>> some time to make things clear, then I can send you one better patch
>> with the correct explanation, am I wrong here?
>
> you're wrong to assume your vendor tree *with changes on DWC3 driver* is
> equivalent to testing *mainline*. That just doesn't add up.
>
> If you were adding just platform init code (something under your mach-*
> directory, some DTS, etc) that's fine. But you have changes on the USB
> peripheral controller driver. This makes me rather uneasy about your
> patches. I mean, if you have changes to DWC3, what other changes do you
> have there? Also, if your changes are in PM code, which we have support

As below code shows up, we will clean the event count register when
waiting for stop the controller (these out of tree code are not added
by me, so I did not know what's problem it fix before). It is one
possible race on my vendor tree.
static int dwc3_gadget_run_stop(struct dwc3 *dwc, int is_on, int suspend)
{
u32 reg;
u32 timeout = 500, i;

if (pm_runtime_suspended(dwc->dev))
return 0;

reg = dwc3_readl(dwc->regs, DWC3_DCTL);
if (is_on) {
if (dwc->revision <= DWC3_REVISION_187A) {
reg &= ~DWC3_DCTL_TRGTULST_MASK;
reg |= DWC3_DCTL_TRGTULST_RX_DET;
}

if (dwc->revision >= DWC3_REVISION_194A)
reg &= ~DWC3_DCTL_KEEP_CONNECT;

reg = dwc3_readl(dwc->regs, DWC3_DCTL);
reg |= DWC3_DCTL_RUN_STOP;

if (dwc->has_hibernation)
reg |= DWC3_DCTL_KEEP_CONNECT;

dwc->pullups_connected = true;
} else {
reg &= ~DWC3_DCTL_RUN_STOP;

if (dwc->has_hibernation && !suspend)
reg &= ~DWC3_DCTL_KEEP_CONNECT;

dwc->pullups_connected = false;
}

dwc3_writel(dwc->regs, DWC3_DCTL, reg);

do {
reg = dwc3_readl(dwc->regs, DWC3_DSTS);
if (is_on) {
if (!(reg & DWC3_DSTS_DEVCTRLHLT))
break;
} else {
if (reg & DWC3_DSTS_DEVCTRLHLT)
break;

/*
* Per databook, Software needs to acknowledge the
* events that are generated (by writing to GEVNTCOUNTn)
* while it is waiting for this bit to be set to 1.
*/
for (i = 0; i < dwc->num_event_buffers; i++) {
reg = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(i));
if (reg)
dwc3_writel(dwc->regs,
DWC3_GEVNTCOUNT(i), reg);
}
}

timeout--;
if (!timeout)
return -ETIMEDOUT;
udelay(1);
} while (1);

dwc3_trace(trace_dwc3_gadget, "gadget %s data soft-%s",
dwc->gadget_driver
? dwc->gadget_driver->function : "no-function",
is_on ? "connect" : "disconnect");

return 0;
}

Usually if I found one problem on my vendor tree, then I will check if
the mainline kernel has the same problem, if it is I will send one
patch tested on my vendor tree to fix that. Usually it can work. But
for this patch, I made one mistake that I missed we mask the
interrupts as you pointed out, so the reason of the problem I
described in commit log was wrong. I need to check if it is caused by
some out of tree code or mainline kernel also has the same problem.
Then I can send one correct patch with correct reason. But as Mark's
suggestion, I will change to test and solve problem on one upstreamed
board, so that it can make sure the problems are not caused by out of
tree code any more. Sorry for noise again.

> in upstream, this suggests that you're using older kernel from the time
> when we didn't have PM support upstream. This means you're using
> something pre-v4.8. Which kernel are you using?
>
> cheers
>
> --
> balbi



--
Baolin.wang
Best Regards