Re: [PATCH 2/5 v3] Modify behaviour of request_*muxed_region()

From: Guenter Roeck
Date: Tue Dec 19 2017 - 11:58:58 EST

On Tue, Dec 19, 2017 at 07:11:11AM +0100, Boszormenyi Zoltan wrote:
> 2017-12-18 20:07 keltezéssel, Guenter Roeck írta:
> >On Mon, Dec 18, 2017 at 09:48:38AM +0100, Zoltan Boszormenyi wrote:
> >>From: Böszörményi Zoltán <zboszor@xxxxx>
> >>
> >>In order to make request_*muxed_region() behave more like
> >>mutex_lock(), a possible failure case needs to be eliminated.
> >>When drivers do not properly share the same I/O region, e.g.
> >>one is using request_region() and the other is using
> >>request_muxed_region(), the kernel didn't warn the user about it.
> >>This change modifies IORESOURCE_MUXED behaviour so it always
> >>goes to sleep waiting for the resuorce to be freed and the
> >
> >resource
> >
> >>inconsistent resource flag usage is logged with KERN_ERR.
> >>
> >>v2: Fixed warnings and extended the comment
> >> about request_declared_muxed_region.
> >>
> >>v3: Rebased for 4.15-rc4
> >>
> >>Signed-off-by: Zoltán Böszörményi <zboszor@xxxxx>
> >>---
> >> kernel/resource.c | 6 +++++-
> >> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/kernel/resource.c b/kernel/resource.c
> >>index 05db9c4e3992..0f26f887ac33 100644
> >>--- a/kernel/resource.c
> >>+++ b/kernel/resource.c
> >>@@ -1143,6 +1143,7 @@ resource_size_t resource_alignment(struct resource *res)
> >> *
> >> * request_declared_muxed_region creates a new shared busy region
> >> * described in an existing resource descriptor.
> >>+ * It only returns if it succeeded.
> >> *
> >> * release_region releases a matching busy region.
> >> * The region is only freed if it was allocated.
> >>@@ -1209,7 +1210,10 @@ struct resource *__request_declared_region(struct resource *parent,
> >> continue;
> >> }
> >> }
> >>- if (conflict->flags & flags & IORESOURCE_MUXED) {
> >>+ if (flags & IORESOURCE_MUXED) {
> >>+ if (!(conflict->flags & IORESOURCE_MUXED))
> >>+ pr_err("Resource conflict between muxed \"%s\" and non-muxed \"%s\" I/O regions!\n",
> >>+ res->name, conflict->name);
> >
> >With this, the muxed resource requestor will hang since the non-muxed
> >requestor will not release the resource. I understand that you are trying
> >to get rid of the error return, but is replacing it with a hang really
> >better ?
> I think it's a definite yes.

I disagree. Some systems are configured to crash on stalls, which is
always worse than failure to load a driver.

I think the real problem is that usb_amd_quirk_pll() does not return
an error. It should, usb_amd_quirk_pll_disable() as well as
usb_amd_quirk_pll_enable() should pass the error to the calling code,
which should handle it. However, there is an even simpler solution -
see below.

> Consider the case of the regression this series is trying to fix.
> Two drivers were trying to access a shared resource, both thinking
> that it's the only one, which was almost true[1] initially.
> An improvement in the i2c-piix4 driver in 4.4-rc3 broke sp5100_tco.
> sp5100_tco was loaded second and failed outright.
> This didn't make anyone to fix the situation, despite the fact
> that there are at least three bugtrackers (kernel, Debian and Red Hat)
> that point out the same problem.
> So, a failing driver initialization is not horrifying enough to get
> something fixed. Maybe a hang is.
> Maybe this behaviour is still too weak, instead of doing it for an
> inconsistent muxed/non-muxed case, it (at least the logging) should
> be done for every case, to detect drivers that try to lock the same
> resource.
> In the case of an inconsistent resource flags usage, the bug is
> not the hang itself, it's the inconsistent resource flags.
Separate problem from the one you are trying to solve. One could argue
that the inconsistent resource use should dump a warning with traceback.
Sure, that would also cause the systems I referred to above to crash,
but it would do so with an actionable traceback, not with an obscure
hang which requires detailed analysis.

> [1] The USB PCI quirks were accessing the same I/O resource without
> locking and I found that on a Kabini machine, we got rare
> "disabled by hub (EMI?), re-enabling..." reports that doesn't occur
> with the locking applied. It is possible that two kernel threads were
> doing I/O at the same time without synchronization.

The usb quirks code only accesses the registers in question to obtain
another set of index registers. It _could_ do that in its initialization
code instead and get the address through amd_chipset_info when needed.
If done corectly, this could actually simplify the related code in
usb_amd_quirk_pll(), since the operation on the address obtained
is the same for all supported chipsets.

Overall, I don't see a need for this API change. There are at least
two other solutions available to handle the usb quirks issue, one of
which is substantially less invasive. The problems in the other drivers
can be solved by using request_muxed_region().