Re: [PATCH v2 2/2] Revert "usb: dwc3: Don't switch OTG -> peripheral if extcon is present"

From: Andrey Smirnov
Date: Wed Oct 12 2022 - 18:14:18 EST


On Wed, Oct 12, 2022 at 3:32 AM Andy Shevchenko
<andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
>
> On Tue, Oct 11, 2022 at 01:17:13PM -0700, Andrey Smirnov wrote:
> > On Tue, Oct 11, 2022 at 2:21 AM Andy Shevchenko
> > <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
> > > On Mon, Oct 10, 2022 at 02:40:30PM -0700, Andrey Smirnov wrote:
> > > > On Mon, Oct 10, 2022 at 12:13 AM Andy Shevchenko
> > > > <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
> > > > > On Sun, Oct 09, 2022 at 10:02:26PM -0700, Andrey Smirnov wrote:
> > > > > > On Fri, Oct 7, 2022 at 6:07 AM Ferry Toth <fntoth@xxxxxxxxx> wrote:
>
> ...
>
> > > > > > OK, Ferry, I think I'm going to need clarification on specifics on
> > > > > > your test setup. Can you share your kernel config, maybe your
> > > > > > "/proc/config.gz", somewhere? When you say you are running vanilla
> > > > > > Linux, do you mean it or do you mean vanilla tree + some patch delta?
> > > > > >
> > > > > > The reason I'm asking is because I'm having a hard time reproducing
> > > > > > the problem on my end. In fact, when I build v6.0
> > > > > > (4fe89d07dcc2804c8b562f6c7896a45643d34b2f) and then do a
> > > > > >
> > > > > > git revert 8bd6b8c4b100 0f0101719138 (original revert proposed by Andy)
> > > > > >
> > > > > > I get an infinite loop of reprobing that looks something like (some
> > > > > > debug tracing, function name + line number, included):
> > > > >
> > > > > Yes, this is (one of) known drawback(s) of deferred probe hack. I think
> > > > > the kernel that Ferry runs has a patch that basically reverts one from
> > > > > 2014 [1] and allows to have extcon as a module. (1)
> > > > >
> > > > > [1]: 58b116bce136 ("drivercore: deferral race condition fix")
> > > > >
> > > > > > which renders the system completely unusable, but USB host is
> > > > > > definitely going to be broken too. Now, ironically, with my patch
> > > > > > in-place, an attempt to probe extcon that ends up deferring the probe
> > > > > > happens before the ULPI driver failure (which wasn't failing driver
> > > > > > probe prior to https://lore.kernel.org/all/20220213130524.18748-7-hdegoede@xxxxxxxxxx/),
> > > > > > there no "driver binding" event that re-triggers deferred probe
> > > > > > causing the loop, so the system progresses to a point where extcon is
> > > > > > available and dwc3 driver eventually loads.
> > > > > >
> > > > > > After that, and I don't know if I'm doing the same test, USB host
> > > > > > seems to work as expected. lsusb works, my USB stick enumerates as
> > > > > > expected. Switching the USB mux to micro-USB and back shuts the host
> > > > > > functionality down and brings it up as expected. Now I didn't try to
> > > > > > load any gadgets to make sure USB gadget works 100%, but since you
> > > > > > were saying it was USB host that was broken, I wasn't concerned with
> > > > > > that. Am I doing the right test?
> > > > >
> > > > > Hmm... What you described above sounds more like a yet another attempt to
> > > > > workaround (1). _If_ this is the case, we probably can discuss how to fix
> > > > > it in generic way (somewhere in dd.c, rather than in the certain driver).
> > > >
> > > > No, I'm not describing an attempt to fix anything. Just how vanilla
> > > > v6.0 (where my patch is not reverted) works and where my patch, fixing
> > > > a logical problem in which extcon was requested too late causing a
> > > > forced OTG -> "gadget only" switch, also changed the ordering enough
> > > > to accidentally avoid the loop.
> > >
> > > You still refer to a fix, but my question was if it's the same problem or not.
> > >
> >
> > No, it's not the same problem.
> >
> > > > > That said, the real test case should be performed on top of clean kernel
> > > > > before judging if it's good or bad.
> > > >
> > > > Given your level of involvemnt with this particular platform and you
> > > > being the author of
> > > > https://github.com/edison-fw/meta-intel-edison/blob/master/meta-intel-edison-bsp/recipes-kernel/linux/files/0043b-TODO-driver-core-Break-infinite-loop-when-deferred-p.patch
> > > > I assumed/expected you to double check this before sending this revert
> > > > out. Please do so next time.
> > >
> > > As I said I have not yet restored my testing environment for that platform and
> > > I relied on the Ferry's report. Taking into account the history of breakages
> > > that done for Intel Merrifield, in particular by not wide tested patches
> > > against DWC3 driver, I immediately react with a revert.
> >
> > That's what I'm asking you not to do next time. If you don't have time
> > to restore your testing env or double check Ferry's work, please live
> > with a revert in your local tree until you do.
>
> I trust Ferry's tests as mine and repeating again, we have a bad history
> when people so value their time that breaks our platform,

This is not a good excuse to jump the gun and send a revert without
double checking. Some regressions will always be unavoidable.

> so please test
> your changes in the future that it makes no regressions.
>

This is, in a nutshell, asking me to prove a negative. That's not a
feasible request. To add insult to injury, you are talking about a
platform way past EOL that's out of stock in every major store and
it's by sheer luck that I was able to get the last kit on eBay.
Downstream will always be the ultimate test for regressions given the
sheer number of permutations. A CI/CD rig that would allow developers
to make a regression test run, would make this a much more reasonable
request, without it, end-user(s) is the only "test bed" there is.

> If you want to have a proof that your patches are broken, then I will
> prioritize this. We now have a full release cycle time for that.
>

You prioritizing this now saves me nothing, whereas you prioritizing
this before submitting reverts would've saved me time. That's the
point I'm trying to convey.

> > My time is as valuable
> > as yours and this revert required much more investigation before it
> > was submitted. You lived with
> > https://github.com/edison-fw/meta-intel-edison/blob/master/meta-intel-edison-bsp/recipes-kernel/linux/files/0043b-TODO-driver-core-Break-infinite-loop-when-deferred-p.patch
> > since 5.10, which apparently was needed to either boot or have dwc3,
> > so I don't think there is any real urgency.
>
> It is in my tree only for the purpose of "don't forget that issue".
> I think you can work around it by built-in extcon driver.
>