Re: [PATCH net-next 00/19] net: phy: add support for shared interrupts (part 1)
From: Vladimir Oltean
Date: Fri Oct 30 2020 - 18:46:39 EST
On Sat, Oct 31, 2020 at 12:06:42AM +0200, Vladimir Oltean wrote:
> On Fri, Oct 30, 2020 at 10:56:24PM +0100, Heiner Kallweit wrote:
> > I'd just like to avoid the term "shared interrupt", because it has
> > a well-defined meaning. Our major concern isn't shared interrupts
> > but support for multiple interrupt sources (in addition to
> > link change) in a PHY.
>
> You may be a little bit confused Heiner.
> This series adds support for exactly _that_ meaning of shared interrupts.
> Shared interrupts (aka wired-OR on the PCB) don't work today with the
> PHY library. I have a board that won't even boot to prompt when the
> interrupt lines of its 2 PHYs are enabled, that this series fixes.
> You might need to take another look through the commit messages I'm afraid.
Maybe this diagram will help you visualize better.
time
|
| PHY 1 PHY 2 has pending IRQ
| | (e.g. link up)
| v |
| phy_interrupt() |
| called for PHY 1 |
| | |
| v |
| returns IRQ_HANDLED via |
| phy_clear_interrupt() |
| | |
| | |
| v |
| handling of shared IRQ |
| ends here |
| | |
| | v
| | PHY 2 still has pending IRQ
| | because, you know, it wasn't actually
| | serviced
| v |
| phy_interrupt() |
| called for PHY 1 |
| | |
| v |
| returns IRQ_HANDLED via |
| phy_clear_interrupt() |
| | |
| | |
| v |
| handling of shared IRQ |
| ends here |
| | PHY 2: Hey! It's me! Over here!
| | |
| v |
| phy_interrupt() |
| called for PHY 1 |
| | |
| v |
| returns IRQ_HANDLED via |
| phy_clear_interrupt() |
| | |
| | |
| v |
| handling of shared IRQ |
| ends here |
| | PHY 2: Srsly?
| | |
| v |
| phy_interrupt() |
| called for PHY 1 |
| | |
| v |
| ... ...
|
| 21 seconds later
|
| RCU stall
v
This happens because today, the way phy_interrupt() is written, you can
only return IRQ_NONE and give the other driver a chance _if_ your driver
implements .did_interrupt(). But the kernel documentation of
.did_interrupt() only recommends to implement that function if you are a
multi-PHY package driver (otherwise stated, the hardware chip has an
embedded shared IRQ). But as things stand, _everybody_ should implement
.did_interrupt() in order for any combination of PHY drivers to support
shared IRQs.
What Ioana is proposing, and this is something that I fully agree with,
is that we just get rid of the layering where the PHY library tries to
be helpful but instead invites everybody to write systematically bad
code. Anyone knows how to write an IRQ handler with eyes closed, but the
fact that .did_interrupt() is mandatory for proper shared IRQ support is
not obvious to everybody, it seems. So let's just have a dedicated IRQ
handling function per each PHY driver, so that we don't get confused in
this sloppy mess of return values, and the code can actually be
followed.
Even _with_ Ioana's changes, there is one more textbook case of shared
interrupts causing trouble, and that is actually the reason why nobody
likes them except hardware engineers who don't get to deal with this.
time
|
| PHY 1 probed
| (module or built-in)
| | PHY 2 has pending IRQ
| | (it had link up from previous
| v boot, or from bootloader, etc)
| phy_interrupt() |
| called for PHY 1 |
| | |
| v |
| returns IRQ_NONE as |
| it should v
| | PHY 2 still has pending IRQ
| | but its handler wasn't called
| | because its driver has not been
| | yet loaded
| v |
| phy_interrupt() |
| called for PHY 1 |
| | |
| v |
| returns IRQ_NONE as |
| it should v
| | PHY 2: Not again :(
| | |
| v |
| phy_interrupt() |
| called for PHY 1 |
| | |
| v |
| returns IRQ_NONE as |
| it should |
| | |
| ... ...
| | |
| | PHY 2 driver never gets probed
| | either because it's a module or
| | because the system is too busy
| | checking PHY 1 over and over
| | again for an interrupt that
| | it did not trigger
| | |
| ... ...
|
| 21 seconds later
|
| RCU stall
v
The way that it's solved is that it's never 100% solved.
This one you can just avoid, but never recover from it.
To avoid it, you must ensure from your previous boot environments
(bootloader, kexec) that the IRQ line is not pending. Because if you
leave the shared IRQ line pending, the system is kaput if it happens to
probe the drivers in the wrong order (aka don't probe first the driver
that will clear that shared IRQ). It's like Minesweeper, only worse.
That's why the shutdown hook is there, as a best-effort attempt for
Linux to clean up after itself. But we're always at the mercy of the
bootloader, or even at the mercy of chance. If the previous kernel
panicked, there's no orderly cleanup to speak of.
Hope it's clearer now.