Re: [PATCH net-next v2 1/1] Documentation: networking: add Twisted Pair Ethernet diagnostics at OSI Layer 1

From: Maxime Chevallier
Date: Thu Oct 03 2024 - 03:54:00 EST


Hi Oleksji,

On Thu, 3 Oct 2024 08:06:02 +0200
Oleksij Rempel <o.rempel@xxxxxxxxxxxxxx> wrote:

> This patch introduces a diagnostic guide for troubleshooting Twisted
> Pair Ethernet variants at OSI Layer 1. It provides detailed steps for
> detecting and resolving common link issues, such as incorrect wiring,
> cable damage, and power delivery problems. The guide also includes
> interface verification steps and PHY-specific diagnostics.

This looks nice ! If I may add some suggestions on the layout (the
content looks very good to me) :

[ ...]

> +- **Interpreting the ethtool output**:
> +
> + - **Supported ports**: Specifies the physical connection type, such as
> + **Twisted Pair (TP)**.
> +
> + - **Supported link modes**:
> +
> + - For **SPE**: This typically indicates one supported mode.
> + - For **MPE**: Multiple link modes are supported, such as **10baseT/Half,
> + 10baseT/Full, 100baseT/Half, 100baseT/Full**.
> +
> + - **Supported pause frame use**: Not used for layer 1 diagnostic
> +
> + - **Supports auto-negotiation**:
> +
> + - For most **SPE** links (e.g., **100baseT1**), autonegotiation is **not
> + supported**.
> +
> + - For **10BaseT1L** and **MPE** links, autonegotiation is typically
> + **Yes**, allowing dynamic negotiation of speed and duplex settings.
> +
> + - **Supported FEC modes**: Forward Error Correction (FEC). Currently not
> + used on this guide.
> +
> + - **Advertised link modes**:
> +
> + - For **SPE** (except **10BaseT1L**), this field will be **Not
> + applicable**, as no link modes can be advertised without autonegotiation.
> +
> + - For **MPE** and **10BaseT1L** links, this will list the link modes that
> + the interface is currently advertising to the link partner.
> +
> + - **Advertised pause frame use**: Not used for layer 1 diagnostic
> +
> + - **Advertised auto-negotiation**:
> +
> + - For **SPE** links (except **10BaseT1L**), this will be **No**.
> +
> + - For **MPE** and **10BaseT1L** links, this will be **Yes** if
> + autonegotiation is enabled.
> +
> + - **Link partner advertised link modes**: Relevant for **any device that
> + supports autonegotiation**, such as **MPE** and **10BaseT1L**. This field
> + displays the subset of link modes supported by the link partner and
> + recognized by the local PHY. If autonegotiation is disabled, this field is
> + not applicable. Some drivers (or may be HW?) do not provide this information
> + even with autonegotiation enabled on both sides - this is considered as bug
> + and should be fixed.
> +
> + - **Link partner advertised pause frame use**: Indicates whether the link
> + partner is advertising pause frame support. This field is only relevant
> + when autonegotiation is enabled.
> +
> + - **Link partner advertised auto-negotiation**: Displays whether the link
> + partner is advertising autonegotiation. If the link partner supports
> + autonegotiation, this field will show **Yes**. If **No**, this field
> + will be probably not visible.
> +
> + - **Speed**: Displays the current operational speed of the interface. This
> + field is especially important when **multiple link modes** are supported.
> + If **autonegotiation** is enabled, the speed is typically automatically
> + selected as the **highest common speed** advertised by both link partners.
> +
> + In cases where the link is in **forced mode** and both sides support
> + multiple speeds, it is crucial to verify that **both sides are forced to
> + the same speed**. A mismatch in forced speeds between the link partners will
> + result in link failure.
> +
> + - **Duplex**: Displays the current duplex setting of the interface, which can
> + be either **Half** or **Full**. In **Full Duplex**, data can be transmitted
> + and received simultaneously, while in **Half Duplex**, transmission and
> + reception occur sequentially. When **autonegotiation** is enabled, the
> + duplex mode is typically negotiated along with the speed.
> +
> + In **forced mode**, it is important to verify that both link partners are
> + configured with the same duplex setting. A **duplex mismatch** (e.g., one
> + side using Full Duplex and the other Half Duplex) usually does not affect
> + the link stability, but it often results in **lower performance**, with
> + symptoms such as reduced throughput and possible present packet collisions.
> +
> + - **Auto-negotiation**: Indicates whether auto-negotiation is enabled on the
> + **local interface**. This shows that the interface is set to negotiate
> + speed and duplex settings with the link partner. However, even if
> + **auto-negotiation** is enabled locally and the link is established, the
> + link partner might not be using auto-negotiation. In such cases, many PHYs
> + are capable of detecting a **forced mode** on the link partner and
> + adjusting to the correct speed and duplex.
> +
> + If the link partner is in **forced mode**, the **"Link partner
> + advertised"** fields will not be present in the `ethtool` output, as the
> + partner isn't advertising any link modes or capabilities. Additionally, the
> + **"Link partner advertised"** fields may also be missing if the **PHY
> + driver** does not support reporting this information, or if the **MAC
> + driver** is not utilizing the Linux **PHYlib** framework to retrieve and
> + report the PHY status.
> +
> + - **Master-slave configuration**: Indicates the current configuration of the
> + **master-slave role** for the interface. This is relevant for certain
> + Ethernet standards, such as **Single-Pair Ethernet (SPE)** and high-speed
> + Ethernet configurations like **1000Base-T** and above, where one device
> + must act as the **master** and the other as the **slave** for proper link
> + establishment.
> +
> + In **auto-negotiation** mode, the master-slave role is typically negotiated
> + automatically. However, there are options to specify **preferred-master**
> + or **preferred-slave** roles. For example, switches often prefer the master
> + role to reduce the time domain crossing delays.
> +
> + In **forced mode**, it is essential to manually configure the master-slave
> + roles correctly on both link partners. If both sides are forced to the same
> + role (e.g., both forced to master), the link will fail to establish.
> +
> + A combination of **auto-negotiation** with **forced roles** can lead to
> + unexpected behavior. If one side forces a role while the other side uses
> + auto-negotiation, it can result in mismatches, especially if both sides
> + force overlapping roles (preferring overlapping roles is usually not a
> + problem). This configuration should be avoided to ensure reliable link
> + establishment.
> +
> + - **Master-slave status**: Displays the current **master-slave role** of the
> + interface, indicating whether the interface is operating as the **master**
> + or the **slave**. This field is particularly relevant in **auto-negotiation
> + mode**, where the master-slave role is determined dynamically during the
> + negotiation process.
> +
> + In **auto-negotiation**, the role is chosen based on the configuration
> + preferences of both link partners (e.g., **preferred-master** or
> + **preferred-slave**). The **master-slave status** field shows the outcome
> + of this negotiation.
> +
> + In **forced mode**, the master-slave configuration is manually set, so the
> + **status** and **configuration** will always be the same, making this field
> + less relevant in that case.
> +
> + - **Link detected**: Displays whether the physical link is up and running.
> +
> + - **Link Down Events**: Tracks how many times the link has gone down. A high
> + number of **Link Down Events** can indicate a physical issue such as cable
> + problems or instability.
> +
> + - **Signal Quality Indicator (SQI)**: Provides a score for signal strength
> + (e.g., **7/7**). A low score indicates potential physical layer
> + issues like interference.
> +
> + - **MDI-X**: Indicates the MDI/MDI-X status, typically relevant for **MPE**
> + links.
> +
> + - **Supports Wake-on**: Shows whether Wake-on-LAN is supported.
> + Not used for layer 1 diagnostic.
> +
> + - **Wake-on**: Displays whether Wake-on-LAN is enabled (e.g., **Wake-on: d**
> + for disabled). Not used for layer 1 diagnostic.

(sorry for the long scroll down there) This whole section is more of a
documentation on what ethtool reports rather than a troubleshooting
guide. I'm all in for getting proper doc for this, but maybe we could
move this in a dedicated page, that we would cross-link from that guide
?

[ ... ]

> +List of Twisted Pair Ethernet Link Modes
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Twisted pair Ethernet variants utilize copper cabling with pairs of wires
> +twisted together to reduce electromagnetic interference (EMI). These link modes
> +are widely used in local area networks (LANs) due to their balance of
> +cost-effectiveness and performance.
> +
> +Below is a list of Ethernet link modes that operate over twisted pair copper
> +cabling. Half and Full duplex variants are combined where applicable.

This section below looks to be in the same ballpark. We already have a
documentation on *some* of the MII flavours (SGMII, 1000BaseX, RGMII, etc.),
maybe we would merge the various linkmodes from the MII side and the
MDI side in the same document ?

There's sometimes a misunderstanding of the various linkmodes from
developers themselves, I think this would warrant its own section.

> +- **10baseT Half/Full**:
> +
> + - The original Ethernet standard over twisted pair cabling.
> + - Supports both half-duplex and full-duplex modes.
> +
> +- **10baseT1L Full**:
> +
> + - Long-reach variant of Ethernet over a single twisted pair.
> + - Supports **autonegotiation** and offers two signal amplitude options:
> +
> + - **2.4 Vpp** for distances up to **1000 meters**.
> + - **1 Vpp** for distances up to **200 meters** (used in hazardous
> + environments).
> +
> + - Primarily used in industrial and building automation environments.
> +
> +- **10baseT1S Half/Full**:
> +
> + - Short-reach variant of Ethernet over a single twisted pair.
> + - Does not support autonegotiation, targeting **fast link establishment within
> + ~10 ms**.
> + - Primarily designed for compact locations, such as automotive environments,
> + where sensors and actuators are clustered.
> + - Supports **multidrop (point-to-multipoint)** configurations, typically used
> + to connect clusters of sensors.
> +
> +- **100baseT Half/Full**:
> +
> + - Also known as Fast Ethernet.
> + - Operates at 100 Mbps over twisted pair cabling.
> + - Supports both half-duplex and full-duplex modes.
> +
> +- **100baseT1 Full**:
> +
> + - Operates at 100 Mbps over a single twisted pair.
> + - Does not support autonegotiation, targeting **fast link creation within
> + ~10 ms**.
> + - Primarily used in automotive and industrial applications.
> +
> +- **1000baseT Full**:
> +
> + - Gigabit Ethernet over twisted pair cabling.
> + - Full-duplex mode is standard and widely used.
> + - Half-duplex mode is not supported by the IEEE 802.3ab standard but may be
> + present in some hardware implementations.
> +
> +- **1000baseT1 Full**:
> +
> + - Gigabit Ethernet over a single twisted pair.
> + - Does not support autonegotiation, targeting **fast link creation within
> + ~10 ms**.
> + - Primarily targeted for automotive and industrial use cases.
> +
> +- **2500baseT and 5000baseT Full**:
> +
> + - Multi-Gigabit Ethernet standards.
> + - Designed to provide higher speeds over existing Cat5e/Cat6 cabling.
> + - Operate at 2.5 Gbps and 5 Gbps respectively.
> +
> +- **10000baseT Full**:
> +
> + - 10 Gigabit Ethernet over twisted pair.
> + - Requires Cat6a or better cabling to achieve full distance (up to 100 meters).
> +
> +Potential Layer 1 Related Issues
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +OSI Layer 1 issues pertain to the physical aspects of network communication.
> +Some of these issues are interrelated or subsets of larger problems, impacting
> +network performance and connectivity. Below is a structured overview of common
> +Layer 1 issues, grouped by their relationships:
> +
> +Cable Damage and Related Issues
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +- **Cable Damage**:
> +
> + - **Description**: Physical damage to the Ethernet cable, including cuts,
> + bends, or degradation due to environmental factors such as heat, moisture,
> + or mechanical stress.
> + - **Symptoms**: Intermittent connectivity, reduced speed, or no link.
> + - **Detection**: Cable testers or PHY diagnostics with time-domain
> + reflectometry (TDR) support.
> +
> + - **Subsets of Cable Damage**:
> +
> + - **Open Circuit**:
> +
> + - **Description**: A break or discontinuity in the cable or connector
> + resulting in no electrical connection.
> + - **Symptoms**: No link is detected.
> + - **Detection**: PHY diagnostics can report "Open Circuit".
> + - **Short Circuit**:
> +
> + - **Description**: An unintended electrical connection between two wires
> + that should be separate.
> + - **Symptoms**: The link may not establish, or the link may drop repeatedly.
> + - **Detection**: Cable testers or PoE/PoDL power detection circuits may
> + detect excessive current draw.
> + - **Impedance Mismatch**:
> +
> + - **Description**: Poor cable quality or incorrect termination causes
> + reflections of the signal due to impedance variations.
> + - **Symptoms**: Reduced signal quality, intermittent connectivity at
> + higher speeds.
> + - **Detection**: TDR diagnostics can detect impedance mismatches.
> +
> +Wiring Issues
> +^^^^^^^^^^^^^
> +
> +- **Incorrect Wiring or Pinout**:
> +
> + - **Description**: Incorrect pair wiring or non-standard pin assignments can
> + cause link failure or degraded performance.
> + - **Symptoms**: No link, reduced speed, or high error rates, especially in
> + multi-pair Ethernet standards (e.g., 1000BASE-T).
> + - **Detection**: Modern PHYs may detect and correct some wiring errors
> + (e.g., MDI/MDI-X auto-crossover), but cable testers provide the most
> + reliable diagnostics.
> +
> + - **Subsets of Incorrect Wiring**:
> +
> + - **Miswired Pairs in Multi-Pair Link Modes**:
> +
> + - **Description**: In multi-pair standards like 10BASE-T, 100BASE-TX, or
> + 1000BASE-T, miswired pairs can cause link failures.
> + - **Symptoms**: Incompatible wiring may work for some speeds (e.g.,
> + 100BASE-TX) but fail for higher speeds (e.g., 1000BASE-T).
> + - **Detection**: Cable testers or PHY diagnostics may identify the issue.
> +
> + - **Polarity Reversal within Pairs**:
> +
> + - **Description**: The positive and negative wires within a pair are
> + swapped.
> + - **Symptoms**: No link or intermittent connection unless modern PHYs with
> + automatic polarity correction are in use.
> + - **Detection**: Modern PHYs can detect and correct polarity reversal.
> + Some expose polarity status in diagnostic registers.
> +
> + - **Split Pairs**:
> +
> + - **Description**: The two wires of a pair are split across different
> + pairs, reducing the effectiveness of signal twisting.
> + - **Symptoms**: Increased crosstalk, higher error rates, and intermittent
> + link drops, particularly at higher speeds like 1000BASE-T.
> + - **Detection**: Cable testers can detect split pairs, and error counters
> + in the PHY may provide an indication.
> +
> +Environmental and External Factors
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +- **Electromagnetic Interference (EMI)**:
> +
> + - **Description**: External electromagnetic fields can interfere with Ethernet
> + signals, particularly in unshielded twisted pair (UTP) cables.
> + - **Symptoms**: Increased transmission errors, reduced speed, or intermittent
> + link drops.
> + - **Detection**: Error counters in the PHY or signal quality indicators (SQI)
> + may help diagnose EMI issues.
> +
> +- **Environmental Factors**:
> +
> + - **Description**: External environmental conditions such as temperature
> + extremes, moisture, UV exposure, or mechanical stress can degrade the cable
> + or connectors, leading to signal degradation.
> + - **Symptoms**: Increased error rates, intermittent connectivity, or link
> + failure.
> + - **Detection**: Error counters and physical inspection can reveal issues
> + related to environmental degradation.
> +
> + - **Related Issues**:
> +
> + - **Excessive Cable Length**:
> +
> + - **Description**: Exceeding the maximum allowed cable length for a given
> + standard can lead to signal loss and degradation.
> + - **Symptoms**: Intermittent connectivity, reduced speed, or no link.
> + - **Detection**: TDR diagnostics can measure the cable length. Error
> + counters may show performance degradation.
> +
> +Cable Quality and Type
> +^^^^^^^^^^^^^^^^^^^^^^
> +
> +- **Use of Incorrect Cable Type**:
> +
> + - **Description**: Using a cable that doesn’t meet the required standards for
> + a specific Ethernet mode (e.g., using CAT5e for 10GBASE-T) or improper
> + shielding.
> + - **Symptoms**: Reduced link speed, increased errors, or no link.
> + - **Detection**: PHY diagnostics such as SQI and cable testers can help detect
> + cable quality issues.
> +
> + - **Related Issue**:
> +
> + - **Shielding Problems**: Improper or incomplete attachment of the shield
> + can lead to similar symptoms as EMI issues. Variants include:
> +
> + - **Unattached Shielding**: Shielding present but not connected at the
> + connector.
> + - **Unconnected Device Ports**: Even if the shield is attached, the device
> + port may not provide a connection.
> +
> +Hardware Issues
> +^^^^^^^^^^^^^^^
> +
> +- **Faulty Network Interface Cards (NICs) or PHYs**:
> +
> + - **Description**: Malfunctioning hardware components such as NICs or PHYs may
> + cause link problems.
> + - **Symptoms**: Network performance degradation or complete failure.
> + - **Detection**: Some PHYs and NICs perform self-tests and may report errors
> + in system logs. Swapping hardware may be required to diagnose these issues.
> +
> +Pair Assignment Issues in Multi-Pair Link Modes
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Ethernet standards that use **two or more pairs** of wires - such as
> +**10BASE-T**, **100BASE-TX**, **1000BASE-T**, and higher - require correct pair
> +assignments for proper operation. Incorrect pair assignments can cause
> +significant network problems, especially as data rates increase.
> +
> +Multi-Pair Link Modes
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +- **Applicable Ethernet Standards**:
> +
> + - **10BASE-T** (10 Mbps Ethernet)
> + - **100BASE-TX** (Fast Ethernet)
> + - **1000BASE-T** (Gigabit Ethernet)
> + - **2.5GBASE-T**, **5GBASE-T**, **10GBASE-T**
> +
> +Pin and Pair Naming Conventions
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +In Ethernet troubleshooting, understanding pin, pair, and color-coding
> +conventions is essential, especially when physical cable repairs are necessary.
> +One major challenge arises in the field when a damaged cable pair needs to be
> +identified and fixed without the ability to replace the entire cable. While
> +Linux diagnostics typically only provide pair names (e.g., "Pair A"), these
> +names do not directly map to the color codes commonly used for cable
> +identification in the field.
> +
> +To further complicate the issue, different standards—such as **TIA-568** and
> +**IEEE 802.3**—use varying conventions for assigning pins to pairs, and pairs
> +to color codes. For example, the pair names reported in diagnostics must be
> +translated into physical wire colors, which differ between **TIA-568A** and
> +**TIA-568B** layouts. This translation process is crucial for accurately
> +identifying and repairing the correct cable pair.
> +
> +Although Linux diagnostic tools provide valuable information, their focus on
> +pair names can make it challenging to map these names to the physical cable
> +layout, particularly in fieldwork where color-coded wires are the primary means
> +of identification. This section aims to highlight this problem and provide
> +enough background on pin, pair, and color-coding conventions to assist with
> +analyzing and addressing these issues. While this guide may not fully resolve
> +the difficulties, it offers important context to help bridge the gap between
> +diagnostics and physical cable repair.
> +
> +TIA-568 Pair and Pin Assignments
> +""""""""""""""""""""""""""""""""

This section here as well could be in another page (standalone or the
same as above) ?

My idea would be to make it a bit easier to read through the
troubleshooting guide, with on one side step-by-step instructions,
crosslinking to a page containing these detailed descriptions.


[ ... ]

> +Linux Kernel Recommendations for Improved Diagnostic Interfaces
> +---------------------------------------------------------------
>
> +As of **Linux kernel v6.11**, several improvements could be implemented to
> +enhance the diagnostic capabilities for Ethernet connections, particularly for
> +twisted pair Ethernet variants. These recommendations aim to address gaps in
> +diagnostics for OSI Layer 1 issues and provide more detailed insights for users
> +and developers.
> +
> +This list will evolve with future kernel versions, reflecting new features or
> +refinements. Below are the current suggestions:

I'm not sure this TODO list has its place in this troubleshooting
guide. I agree with the points you list, but this looks more like a
roadmap for PHY stuff to improve. I don't really know where this list
could go and if it's common to maintain this kind of "TODO list" in the
kernel doc though. Maybe Andrew has an idea ?

Thanks for coming-up with such a detailed guide. I also have some "PHY
bringup 101" ideas on the common errors faced by developers, and this is
document would be the ideal place to maintain this crucial information.

Maxime