Re: [PATCH v3] driver core: Break infinite loop when deferred probe can't be satisfied

From: Grant Likely
Date: Thu Mar 26 2020 - 10:47:11 EST




On 26/03/2020 11:54, Andy Shevchenko wrote:
On Wed, Mar 25, 2020 at 03:08:29PM -0700, Saravana Kannan wrote:
On Wed, Mar 25, 2020 at 5:51 AM Andy Shevchenko
<andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
On Tue, Mar 24, 2020 at 08:29:01PM -0700, Saravana Kannan wrote:
On Tue, Mar 24, 2020 at 5:38 AM Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote:
Consider the following scenario.

The main driver of USB OTG controller (dwc3-pci), which has the following
functional dependencies on certain platform:
- ULPI (tusb1210)
- extcon (tested with extcon-intel-mrfld)

Note, that first driver, tusb1210, is available at the moment of
dwc3-pci probing, while extcon-intel-mrfld is built as a module and
won't appear till user space does something about it.

This is depicted by kernel configuration excerpt:

CONFIG_PHY_TUSB1210=y
CONFIG_USB_DWC3=y
CONFIG_USB_DWC3_ULPI=y
CONFIG_USB_DWC3_DUAL_ROLE=y
CONFIG_USB_DWC3_PCI=y
CONFIG_EXTCON_INTEL_MRFLD=m

In the Buildroot environment the modules are probed by alphabetical ordering
of their modaliases. The latter comes to the case when USB OTG driver will be
probed first followed by extcon one.

So, if the platform anticipates extcon device to be appeared, in the above case
we will get deferred probe of USB OTG, because of ordering.

Since current implementation, done by the commit 58b116bce136 ("drivercore:
deferral race condition fix") counts the amount of triggered deferred probe,
we never advance the situation -- the change makes it to be an infinite loop.

Hi Andy,

I'm trying to understand this sequence of steps. Sorry if the questions
are stupid -- I'm not very familiar with USB/PCI stuff.

Thank you for looking into this. My answer below.

As a first thing I would like to tell that there is another example of bad
behaviour of deferred probe with no relation to USB. The proposed change also
fixes that one (however, less possible to find in real life).

Unless I see what the other issue is, I can't speak for the unknown.

Okay, let's talk about other case (actually it's the one which I had noticed
approximately at the time when culprit patch made the kernel).

For some debugging purposes I have been using pin control table in board code.

Since I would like to boot kernel on different systems I have some tables for
non-existing pin control device. Pin control framework returns -EPROBE_DEFER
when trying to probe device with attached table for wrong pin control. This is
fine, the problem is that *any* successfully probed device, which happens in
the deferred probe initcall will desynchronize existing counter. As a result ->
infinite loop. For the record, I didn't realize and didn't investigate that
time the issue and now I can confirm that this is a culprit which is fixed by
this patch.

Specific code path please?

g.