Re: [Intel-wired-lan] MDI errors during resume from ACPI S3 (suspend to ram)

From: Neftin, Sasha
Date: Thu Aug 08 2019 - 02:11:00 EST


On 8/7/2019 17:55, Paul Menzel wrote:

Dear Sasha,


On 07.08.19 09:23, Neftin, Sasha wrote:
On 8/6/2019 18:53, Mario.Limonciello@xxxxxxxx wrote:
-----Original Message-----
From: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
Sent: Tuesday, August 6, 2019 10:36 AM
To: Jeff Kirsher
Cc: intel-wired-lan@xxxxxxxxxxxxxxxx; Linux Kernel Mailing List; Limonciello, Mario
Subject: MDI errors during resume from ACPI S3 (suspend to ram)

Dear Linux folks,


Trying to decrease the resume time of Linux 5.3-rc3 on the Dell OptiPlex
5040 with the device below

ÂÂÂÂ $ lspci -nn -s 00:1f.6
ÂÂÂÂ 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2)
I219-V [8086:15b8] (rev 31)

pm-graphâs script `sleepgraph.py` shows, that the driver *e1000e* takes
around 400 ms, which is quite a lot. The call graph trace shows that
`e1000e_read_phy_reg_mdic()` is responsible for a lot of those. From
`drivers/net/ethernet/intel/e1000e/phy.c` [1]:

ÂÂÂÂÂÂÂÂ for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ udelay(50);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ mdic = er32(MDIC);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (mdic & E1000_MDIC_READY)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
ÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂÂ if (!(mdic & E1000_MDIC_READY)) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ e_dbg("MDI Read did not complete\n");
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return -E1000_ERR_PHY;
ÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂÂ if (mdic & E1000_MDIC_ERROR) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ e_dbg("MDI Error\n");
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return -E1000_ERR_PHY;
ÂÂÂÂÂÂÂÂ }

Unfortunately, errors are not logged if dynamic debug is disabled,
so rebuilding the Linux kernel with `CONFIG_DYNAMIC_DEBUG`, and

ÂÂÂÂ echo "file drivers/net/ethernet/* +p" | sudo tee
/sys/kernel/debug/dynamic_debug/control

I got the messages below.

ÂÂÂÂ [ 4159.204192] e1000e 0000:00:1f.6 net00: MDI Error
ÂÂÂÂ [ 4160.267950] e1000e 0000:00:1f.6 net00: MDI Write did not complete
ÂÂÂÂ [ 4160.359855] e1000e 0000:00:1f.6 net00: MDI Error

Can you please shed a little more light into these errors? Please
find the full log attached.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/e1000e/phy.c#n206

Strictly as a reference point you may consider trying the out-of-tree driver to see if these
behaviors persist.

https://sourceforge.net/projects/e1000/

I can try that in the next days.

We are using external PHY. Required ~200 ms to complete MDIC
transaction (depended on the project).

Are you referring to the out-of-tree driver?

I believe the out of tree driver have a same approach to MDIC access.
You need to take to consider this time before access to the PHY. I do
not recommend decrease timer in a 'e1000e_read_phy_reg_mdic()'
method. We could hit on wrong MDI access.
My point was more, if you know that more time is needed, before the MDI
setting(?) will succeed, why try it anyway and go into the error paths?
Isnât there some polling possible to find out, when MDI can be set up?

e1000e is very old driver and serve pretty lot of 1G clients. Each 1Gbe MAC/PHY controller have a different configuration depend platform.

Kind regards,

Paul

Hello Paul,
Let me back later with more information specific your device. I will try find out more details with design team.