Re: [question] net: phy: rtl8211f: link speed shows 1000Mb/s but actual link speed in phy is 100Mb/s
From: Yonglong Liu
Date: Tue May 12 2020 - 23:04:34 EST
On 2020/5/13 9:59, Andrew Lunn wrote:
> On Wed, May 13, 2020 at 09:34:13AM +0800, Yonglong Liu wrote:
>> Hi, Andrew:
>> Thanks for your reply!
>>
>> On 2020/5/12 22:00, Andrew Lunn wrote:
>>> On Tue, May 12, 2020 at 08:48:21PM +0800, Yonglong Liu wrote:
>>>> I use two devices, both support 1000M speed, they are directly connected
>>>> with a network cable. Two devices enable autoneg, and then do the following
>>>> test repeatedly:
>>>> ifconfig eth5 down
>>>> ifconfig eth5 up
>>>> sleep $((RANDOM%6))
>>>> ifconfig eth5 down
>>>> ifconfig eth5 up
>>>> sleep 10
>>>>
>>>> With low probability, one device A link up with 100Mb/s, the other B link up with
>>>> 1000Mb/s(the actual link speed read from phy is 100Mb/s), and the network can
>>>> not work.
>>>>
>>>> device A:
>>>> Settings for eth5:
>>>> Supported ports: [ TP ]
>>>> Supported link modes: 10baseT/Half 10baseT/Full
>>>> 100baseT/Half 100baseT/Full
>>>> 1000baseT/Full
>>>> Supported pause frame use: Symmetric Receive-only
>>>> Supports auto-negotiation: Yes
>>>> Supported FEC modes: Not reported
>>>> Advertised link modes: 10baseT/Half 10baseT/Full
>>>> 100baseT/Half 100baseT/Full
>>>> 1000baseT/Full
>>>> Advertised pause frame use: Symmetric
>>>> Advertised auto-negotiation: Yes
>>>> Advertised FEC modes: Not reported
>>>> Link partner advertised link modes: 10baseT/Half 10baseT/Full
>>>> 100baseT/Half 100baseT/Full
>>>> Link partner advertised pause frame use: Symmetric
>>>> Link partner advertised auto-negotiation: Yes
>>>> Link partner advertised FEC modes: Not reported
>>>> Speed: 100Mb/s
>>>> Duplex: Full
>>>> Port: MII
>>>> PHYAD: 3
>>>> Transceiver: internal
>>>> Auto-negotiation: on
>>>> Current message level: 0x00000036 (54)
>>>> probe link ifdown ifup
>>>> Link detected: yes
>>>>
>>>> The regs value read from mdio are:
>>>> reg 9 = 0x200
>>>> reg a = 0
>>>>
>>>> device B:
>>>> Settings for eth5:
>>>> Supported ports: [ TP ]
>>>> Supported link modes: 10baseT/Half 10baseT/Full
>>>> 100baseT/Half 100baseT/Full
>>>> 1000baseT/Full
>>>> Supported pause frame use: Symmetric Receive-only
>>>> Supports auto-negotiation: Yes
>>>> Supported FEC modes: Not reported
>>>> Advertised link modes: 10baseT/Half 10baseT/Full
>>>> 100baseT/Half 100baseT/Full
>>>> 1000baseT/Full
>>>> Advertised pause frame use: Symmetric
>>>> Advertised auto-negotiation: Yes
>>>> Advertised FEC modes: Not reported
>>>> Link partner advertised link modes: 10baseT/Half 10baseT/Full
>>>> 100baseT/Half 100baseT/Full
>>>> 1000baseT/Full
>>>> Link partner advertised pause frame use: Symmetric
>>>> Link partner advertised auto-negotiation: Yes
>>>> Link partner advertised FEC modes: Not reported
>>>> Speed: 1000Mb/s
>>>> Duplex: Full
>>>> Port: MII
>>>> PHYAD: 3
>>>> Transceiver: internal
>>>> Auto-negotiation: on
>>>> Current message level: 0x00000036 (54)
>>>> probe link ifdown ifup
>>>> Link detected: yes
>>>>
>>>> The regs value read from mdio are:
>>>> reg 9 = 0
>>>> reg a = 0x800
>>>>
>>>> I had talk to the FAE of rtl8211f, they said if negotiation failed with 1000Mb/s,
>>>> rtl8211f will change reg 9 to 0, than try to negotiation with 100Mb/s.
>>>>
>>>> The problem happened as:
>>>> ifconfig eth5 up -> phy_start -> phy_start_aneg -> phy_modify_changed(MII_CTRL1000)
>>>> (this time both A and B, reg 9 = 0x200) -> wait for link up -> (B: reg 9 changed to 0)
>>>> -> link up.
>>>
>>> This sounds like downshift, but not correctly working. 1Gbps requires
>>> that 4 pairs in the cable work. If a 1Gbps link is negotiated, but
>>> then does not establish because one of the pairs is broken, some PHYs
>>> will try to 'downshift'. They drop down to 100Mbps, which only
>>> requires two pairs of the cable to work. To do this, the PHY should
>>> change what it is advertising, to no longer advertise 1G, just 100M
>>> and 10M. The link partner should then try to use 100Mbps and
>>> hopefully, a link is established.
>>>
>>> Looking at the ethtool, you can see device A is reporting device B is
>>> only advertising upto 100Mbps. Yet it is locally using 1G. That is
>>> broken. So i would say device A has the problem. Are both PHYs
>>> rtl8211f?
>>
>> Both PHY is rtl8211f. I think Device B is broken. Device B advertising
>> it supported 1G, but actually, in phy, downshift to 100M, so Device B
>> link up with 1G in driver side, but actually 100M in phy.
>
> You have to be careful with the output of ethtool. Downshift is not
> part of 802.3. There i no standard register to indicate it has
> happened. Sometimes there is a vendor register. You should check the
> datasheet, and look at what other PHY drivers do for this, and
> phy_check_downshift().
>
>>> Are you 100% sure your cable and board layout is good? Is it
>>> trying downshift because something is broken? Fix the
>>> cable/connector and the
>
>> Will check the layout with hardware engineer. This happened with a low
>> probability. When this happened, another down/up operation or restart
>> autoneg will solved.
>
>>> reason to downshift goes away. But it does not solve the problem if a
>>> customer has a broken cable. So you might want to deliberately cut a
>>> pair in the cable so it becomes 100% reproducable and try to debug it
>>> further. See if you can find out why auto-neg is not working
>>> correctly.
>>
>> So, your opinion is, maybe we should checkout whether the hardware layout
>> or cable have problem?
>
> Well, there are a couple of issues here.
>
> It could be a hardware problem. Best case, it is the cable. But if you
> can reproduce it with other boards, it is a board design issue, which
> you might want to get fixed. If it happens for you in the lab, it will
> probably happen out in the field.
>
> You should also consider what you want to happen with a cable that
> really is broken. It would be nice if downshift worked. Slower
> networking is better than no networking. Unless you have a requirement
> that 100Mbps is too slow for your use case. So you might want to debug
> what is going wrong when downshift happens.
>
>> By the way, do we have some mechanism to solve this downshift in software
>> side? If the PHY advertising downshift to 100M, but software still have
>> advertising with 1G(just like Device B), it will always have a broken network.
>
> You might get some ideas from phy_check_downshift(). A lot will
> depended on what information you can get from the PHY.
>
> Andrew
>
Hi, Andrew:
Thanks very much! That's so helpfull!
> .
>