RE: [Intel-wired-lan] [PATCH] i40e: The state of phy may not be correct during power-on

From: Kubalewski, Arkadiusz
Date: Tue Apr 13 2021 - 17:33:40 EST


>On 4/10/21 2:12 AM, Kubalewski, Arkadiusz wrote:
>>> -----Original Message-----
>>> From: Intel-wired-lan <intel-wired-lan-bounces@xxxxxxxxxx> On Behalf Of xiao33522@xxxxxx
>>> Sent: piątek, 9 kwietnia 2021 11:18
>>> To: Brandeburg, Jesse <jesse.brandeburg@xxxxxxxxx>; Nguyen, Anthony L <anthony.l.nguyen@xxxxxxxxx>
>>> Cc: netdev@xxxxxxxxxxxxxxx; xiaolinkui <xiaolinkui@xxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; intel-wired-lan@xxxxxxxxxxxxxxxx; kuba@xxxxxxxxxx; davem@xxxxxxxxxxxxx
>>> Subject: [Intel-wired-lan] [PATCH] i40e: The state of phy may not be correct during power-on
>>>
>>> From: xiaolinkui <xiaolinkui@xxxxxxxxxx>
>>>
>>> Sometimes the power on state of the x710 network card indicator is not right, and the indicator shows orange. At this time, the network card speed is Gigabit.
>> By "power on state" you mean that it happens after power-up of the server?
>Yes, it means that sometimes the boot state of the server is still in
>the BIOS boot stage, and the network card indicator is wrong(orange
>indicator).
>

I am still confused a little bit, at that point (before proper link is established)
the NIC is supposed to be in so called pxe-mode. Which allows for some basic
functionality. I wonder are you sure it happens "sometimes"? I would say this
behavior is expected after each Power-Off-Reset of the host.

>>
>>> After entering the system, check the network card status through the ethtool command as follows:
>>>
>>> [root@localhost ~]# ethtool enp132s0f0
>>> Settings for enp132s0f0:
>>> Supported ports: [ FIBRE ]
>>> Supported link modes: 1000baseX/Full
>>> 10000baseSR/Full
>>> Supported pause frame use: Symmetric
>>> Supports auto-negotiation: Yes
>>> Supported FEC modes: Not reported
>>> Advertised link modes: 1000baseX/Full
>>> 10000baseSR/Full
>>> Advertised pause frame use: No
>>> Advertised auto-negotiation: Yes
>>> Advertised FEC modes: Not reported
>>> Speed: 1000Mb/s
>>> Duplex: Full
>>> Port: FIBRE
>>> PHYAD: 0
>>> Transceiver: internal
>>> Auto-negotiation: off
>>> Supports Wake-on: d
>>> Wake-on: d
>>> Current message level: 0x00000007 (7)
>>> drv probe link
>>> Link detected: yes
>>>
>>> We can see that the speed is 1000Mb/s.
>>>
>>> If you unplug and plug in the optical cable, it can be restored to 10g.
>>> After this operation, the rate is as follows:
>>>
>>> [root@localhost ~]# ethtool enp132s0f0
>>> Settings for enp132s0f0:
>>> Supported ports: [ FIBRE ]
>>> Supported link modes: 1000baseX/Full
>>> 10000baseSR/Full
>>> Supported pause frame use: Symmetric
>>> Supports auto-negotiation: Yes
>>> Supported FEC modes: Not reported
>>> Advertised link modes: 1000baseX/Full
>>> 10000baseSR/Full
>>> Advertised pause frame use: No
>>> Advertised auto-negotiation: Yes
>>> Advertised FEC modes: Not reported
>>> Speed: 10000Mb/s
>>> Duplex: Full
>>> Port: FIBRE
>>> PHYAD: 0
>>> Transceiver: internal
>>> Auto-negotiation: off
>>> Supports Wake-on: d
>>> Wake-on: d
>>> Current message level: 0x00000007 (7)
>>> drv probe link
>>> Link detected: yes
>>>
>>> Calling i40e_aq_set_link_restart_an can also achieve this function.
>>> So we need to do a reset operation for the network card when the network card status is abnormal.
>> Can't say much about the root cause of the issue right now,
>> but I don't think it is good idea for the fix.
>> This leads to braking existing link each time
>> i40e_aq_get_link_info is called on 1 Gigabit PHY.
>> For example 'ethtool -m <dev>' does that.
>>
>> Have you tried reloading the driver?
>> Thanks!
> I tried to unload the driver again and then load the driver, but it didn't work.If I pull the fiber optic cable off and plug it in, it can be recovered from 1000Mb/s to 10000Mb/s.
>

Well, it is at least strange for me.

Since on driver load there is already a call to:
i40e_aq_set_link_restart_an(hw, true, NULL);
Although, in order to be called you need to have up to date Firmware of
your NIC. Maybe this is the case? Have you tried to update NVM of the NIC?

Another way would be to use link-down-on-close feature.
First enable link-down-on-close private flag, and then
perform link-down and link-up on the port.

Anyway I don't think this patch is fixing anything, it looks like a workaround
that hides actual problem.

Thanks

>>> Signed-off-by: xiaolinkui <xiaolinkui@xxxxxxxxxx>
>>> ---
>>> drivers/net/ethernet/intel/i40e/i40e_common.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
>>> index ec19e18305ec..dde0224776ac 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_common.c
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
>>> @@ -1866,6 +1866,10 @@ i40e_status i40e_aq_get_link_info(struct i40e_hw *hw,
>>> hw_link_info->max_frame_size = le16_to_cpu(resp->max_frame_size);
>>> hw_link_info->pacing = resp->config & I40E_AQ_CONFIG_PACING_MASK;
>>>
>>> + if (hw_link_info->phy_type == I40E_PHY_TYPE_1000BASE_SX &&
>>> + hw->mac.type == I40E_MAC_XL710)
>>> + i40e_aq_set_link_restart_an(hw, true, NULL);
>>> +
>>> /* update fc info */
>>> tx_pause = !!(resp->an_info & I40E_AQ_LINK_PAUSE_TX);
>>> rx_pause = !!(resp->an_info & I40E_AQ_LINK_PAUSE_RX);
>>> --
>>> 2.17.1
>>>
>>> _______________________________________________
>>> Intel-wired-lan mailing list
>>> Intel-wired-lan@xxxxxxxxxx
>>> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
>>>