Re: [Intel-wired-lan] [PATCH] e1000e: Link flap workaround option for false IRP events

From: Lifshits, Vitaly
Date: Tue Mar 04 2025 - 05:50:01 EST

Next message: Changwoo Min: "[PATCH v4 1/2] sched_ext: Change the event type from u64 to s64"
Previous message: Daniel Palmer: "Re: Bootloaders, was Re: [RFC PATCH 0/3] m68k goes DT"
In reply to: Mark Pearson: "Re: [Intel-wired-lan] [PATCH] e1000e: Link flap workaround option for false IRP events"
Next in thread: Andrew Lunn: "Re: [Intel-wired-lan] [PATCH] e1000e: Link flap workaround option for false IRP events"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 3/3/2025 5:34 AM, Mark Pearson wrote:

Hi Andrew,

On Sun, Mar 2, 2025, at 11:13 AM, Andrew Lunn wrote:

On Sun, Mar 02, 2025 at 03:09:35PM +0200, Lifshits, Vitaly wrote:

Hi Mark,

Hi Andrew

On Thu, Feb 27, 2025, at 11:07 AM, Andrew Lunn wrote:

+ e1e_rphy(hw, PHY_REG(772, 26), &phy_data);

Please add some #define for these magic numbers, so we have some idea
what PHY register you are actually reading. That in itself might help
explain how the workaround actually works.

I don't know what this register does I'm afraid - that's Intel knowledge and has not been shared.

What PHY is it? Often it is just a COTS PHY, and the datasheet might
be available.

Given your setup description, pause seems like the obvious thing to
check. When trying to debug this, did you look at pause settings?
Knowing what this register is might also point towards pause, or
something totally different.

Andrew

For the PHY - do you know a way of determining this easily? I can reach out to the platform team but that will take some time. I'm not seeing anything in the kernel logs, but if there's a recommended way of confirming that would be appreciated.

The PHY is I219 PHY.
The datasheet is indeed accessible to the public:
https://cdrdv2-public.intel.com/612523/ethernet-connection-i219-datasheet.pdf

Thanks for the link.

So it is reading page 772, register 26. Page 772 is all about LPI. So
we can have a #define for that. Register 26 is Memories Power. So we
can also have an #define for that.

Yep - I'll look to add this.

However, that does not really help explain how this helps prevent an
interrupt. I assume playing with EEE settings was also played
with. Not that is register appears to have anything to do with EEE!

I don't think we did tried those - it was never suggested that I can recall (the original debug started 6 months+ ago). I don't know fully what testing Intel did in their lab once the issue was reproduced there.

If you have any particular recommendations we can try that - with a note that we have to run a soak for ~1 week to have confidence if a change made a difference (the issue can reproduce between 1 to 2 days).

Personally I doubt that it is related to EEE since there was no real link flap.

I suggest to try replacing the register read for a short delay or reading the PHY STATUS register instead.

Reading this register was suggested for debug purposes to understand if
there is some misconfiguration. We did not find any misconfiguration.
The issue as we discovered was a link status change interrupt caused the
CSME to reset the adapter causing the link flap.

We were unable to determine what causes the link status change interrupt in
the first place. As stated in the comment, it was only ever observed on
Lenovo P5/P7systems and we couldn't ever reproduce on other systems. The
reproduction in our lab was on a P5 system as well.

Regarding the suggested workaround, there isn’t a clear understanding why it
works. We suspect that reading a PHY register is probably prevents the CSME
from resetting the PHY when it handles the LSC interrupt it gets. However,
it can also be a matter of slight timing variations.

I don't follow what you are saying here. As far as i can see, the
interrupt handler will triggers a read of the BMCR to determine the
link status. It should not matter if there is a spurious interrupt,
the BMCR should report the truth. So does BMCR actually indicate the
link did go down? I also see there is the usual misunderstanding with
how BMCR is latching. It should not be read twice, processed once, it
should be processed each time, otherwise you miss quick link down/up
events.

We communicated that this solution is not likely to be accepted to the
kernel as is, and the initial responses on the mailing list demonstrate the
pushback.

What it has done is start a discussion towards an acceptable
solution. Which is a good thing. But at the moment, the discussion
does not have sufficient details.

Please could somebody describe the chain of events which results in
the link down, and subsequent link up. Is the interrupt spurious, or
does BMCR really indicate the link went down and up again?

I'm fairly certain there is no actual link bounce but I don't know the reason for the interrupt or why it was triggered.

Vitaly, do you have a way of getting these answers from the Intel team that worked on this? I don't think I'll be able to get any answers, unfortunately.

You are correct, from what we saw there was no real link flap there. Only a false link status change interrupt.

On a different topic, I suggest removing the part of the comment below:
* Intel unable to determine root cause.
The issue went through joint debug by Intel and Lenovo, and no obvious spec
violations by either party were found. There doesn’t seem to be value in
including this information in the comments of upstream code.

I partially agree. Assuming the root cause is not found, and a
workaround is used, i expect a commit message with a detailed
description of the chain of events which results in the link
flap. Then a statement that the root cause is unknown, and lastly the
commit message should say the introduced change, for unknown reasons,
solves the issue, and is considered safe because.... Ideally the
workaround should be safe for everybody, and can be just enabled.

Ack - I'll aim to do that, as best I can.

I think as Vitaly notes, the read should not be introduced for the general case, in case it misses link bounces in other situations?

Does that confirm that the module option, so it can be selectively enabled, is a reasonable workaround solution. Let me know if there are other ideas.

Thanks
Mark

Next message: Changwoo Min: "[PATCH v4 1/2] sched_ext: Change the event type from u64 to s64"
Previous message: Daniel Palmer: "Re: Bootloaders, was Re: [RFC PATCH 0/3] m68k goes DT"
In reply to: Mark Pearson: "Re: [Intel-wired-lan] [PATCH] e1000e: Link flap workaround option for false IRP events"
Next in thread: Andrew Lunn: "Re: [Intel-wired-lan] [PATCH] e1000e: Link flap workaround option for false IRP events"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]