Re: [PATCH net-next v2 4/6] r8169: add support for RTL8116af

From: Jakub Kicinski

Date: Sat Jun 13 2026 - 18:10:53 EST


This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
r8169: add support for RTL8116af

This change adds support for RTL8116af, a SerDes/SFP variant of the
RTL_GIGA_MAC_VER_52 family that has no internal PHY. It introduces a
phylink_pcs implementation that reads link state directly from the
SerDes registers via OCP and skips r8169_mdio_register() so that no
phy_device is bound, with NULL guards added in several callers.

A few questions about the change below.

[Low]
The commit message says:

Phylink uses pcs to get the link status and speed from its serdes
reg, instead of standard phy reg.

Looking at rtl8169_pcs_get_state(), only link / an_complete / lpa
appear to come from the SerDes registers. The speed and duplex are
hardcoded:

if (state->link) {
state->speed = SPEED_1000;
state->duplex = DUPLEX_FULL;
}

Is that mismatch with the commit message intentional, given that for
1000BASE-X 1000FD is the only valid mode anyway?

> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 615bd4107359..a73c0215b240 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c

[ ... ]

> @@ -3703,12 +3748,14 @@ static void rtl_hw_start_8117(struct rtl8169_private *tp)
>
> rtl_pcie_state_l2l3_disable(tp);
>
> - rg_saw_cnt = phy_read_paged(tp->phydev, 0x0c42, 0x13) & 0x3fff;
> - if (rg_saw_cnt > 0) {
> - u16 sw_cnt_1ms_ini;
> + if (tp->phydev) {
> + rg_saw_cnt = phy_read_paged(tp->phydev, 0x0c42, 0x13) & 0x3fff;
> + if (rg_saw_cnt > 0) {
> + u16 sw_cnt_1ms_ini;
>
> - sw_cnt_1ms_ini = (16000000 / rg_saw_cnt) & 0x0fff;
> - r8168_mac_ocp_modify(tp, 0xd412, 0x0fff, sw_cnt_1ms_ini);
> + sw_cnt_1ms_ini = (16000000 / rg_saw_cnt) & 0x0fff;
> + r8168_mac_ocp_modify(tp, 0xd412, 0x0fff, sw_cnt_1ms_ini);
> + }
> }

[Medium]
rtl_hw_start_8117() is shared between RTL8117 and the new RTL8116af
since both are RTL_GIGA_MAC_VER_52. With this change the calibration
of MAC register 0xd412 from PHY register 0x0c42:0x13 is silently
skipped for 8116af, leaving 0xd412 at its reset value rather than a
calibrated 1ms count.

Does 8116af actually not need 0xd412 configured, or does it need an
alternative calibration source? The commit log does not mention either
case.

[ ... ]

> @@ -5647,6 +5696,10 @@ static void rtl_mac_link_up(struct phylink_config *config, struct phy_device *ph
> static struct phylink_pcs *rtl_mac_select_pcs(struct phylink_config *config,
> phy_interface_t interface)
> {
> + struct rtl8169_private *tp = container_of(config, struct rtl8169_private, phylink_config);
> +
> + if (interface == PHY_INTERFACE_MODE_1000BASEX || interface == PHY_INTERFACE_MODE_SGMII)
> + return &tp->pcs;
> return NULL;
> }
>
> @@ -5655,6 +5708,55 @@ static void rtl_mac_config(struct phylink_config *config, unsigned int mode,
> {
> }
>
> +static u16 rtl8169_sds_read(struct rtl8169_private *tp, u16 sds_reg)
> +{
> + r8168_mac_ocp_write(tp, OCP_SDS_ADDR_REG, sds_reg);
> + r8168_mac_ocp_write(tp, OCP_SDS_CMD_REG, SDS_CMD_READ);
> + return r8168_mac_ocp_read(tp, OCP_SDS_DATA_REG);
> +}

[Medium]
The three-step SDS sequence (write addr, write cmd, read data) is
implemented with three separate r8168_mac_ocp_*() calls, each of which
takes and releases tp->mac_ocp_lock independently.

rtl8169_pcs_get_state() can run from phylink_resolve() (workqueue) and
also from the ethtool/mii ioctl paths, e.g.:

phylink_ethtool_ksettings_get()
-> phylink_mac_pcs_get_state()
-> pcs->ops->pcs_get_state()

These contexts share no common lock. Can two SDS transactions
interleave between the address/cmd writes and the data read, so the
engine sees an addr from one caller and a cmd from another and returns
data for the wrong register?

The existing r8168_mac_ocp_modify() uses a single
raw_spin_lock_irqsave around __r8168_mac_ocp_read/write to keep its
RMW atomic. Should this SDS helper do the same across the three
accesses?

[Medium]
After writing SDS_CMD_READ (0x0001) to OCP_SDS_CMD_REG, the data
register is read immediately without polling for completion. Other
indirect accessors in this driver follow a kick-then-poll pattern, for
example r8168_phy_ocp_read():

RTL_W32(tp, GPHY_OCP, reg << 15);
return rtl_loop_wait_high(tp, &rtl_ocp_gphy_cond, 25, 10) ?
(RTL_R32(tp, GPHY_OCP) & 0xffff) : -ETIMEDOUT;

and rtl_eri_read, rtl_efuse_read, rtl_ephy_read, rtl_csi_read and
r8169_mdio_read_reg all wait on a completion bit before reading the
data register.

Is the SerDes engine guaranteed to complete synchronously here, or can
the read of OCP_SDS_DATA_REG return stale data from the previous
transaction, leading to wrong BMSR / LPA values being reported back to
phylink?

> +
> +static void rtl8169_pcs_get_state(struct phylink_pcs *pcs,
> + unsigned int neg_mode,
> + struct phylink_link_state *state)
> +{
> + struct rtl8169_private *tp = container_of(pcs, struct rtl8169_private, pcs);
> + u16 bmsr, lpa;
> +
> + bmsr = rtl8169_sds_read(tp, RTL_SDS_C22_BASE + MII_BMSR);
> + lpa = rtl8169_sds_read(tp, RTL_SDS_C22_BASE + MII_LPA);
> +
> + state->link = !!(bmsr & BMSR_LSTATUS);
> + state->an_complete = !!(bmsr & BMSR_ANEGCOMPLETE);
> + if (state->link) {
> + state->speed = SPEED_1000;
> + state->duplex = DUPLEX_FULL;
> + } else {
> + state->speed = SPEED_UNKNOWN;
> + state->duplex = DUPLEX_UNKNOWN;
> + }
> +
> + if (lpa & LPA_1000XPAUSE)
> + state->pause |= MLO_PAUSE_RX | MLO_PAUSE_TX;
> +}

[Low]
The 1000BASE-X LPA encodes pause as two bits, LPA_1000XPAUSE and
LPA_1000XPAUSE_ASYM, which together describe four states (none, sym,
asym TX, asym RX). Treating any peer that sets LPA_1000XPAUSE as full
bidirectional pause looks like it ignores the asymmetric cases.

Could this use phylink_mii_c22_pcs_decode_state() (or equivalent
decoding logic) so both LPA pause bits are interpreted correctly?

> +
> +static int rtl8169_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
> + phy_interface_t interface,
> + const unsigned long *advertising,
> + bool permit_pause_to_mac)
> +{
> + return 0;
> +}
> +
> +static int rtl8169_pcs_validate(struct phylink_pcs *pcs, unsigned long *supported,
> + const struct phylink_link_state *state)
> +{
> + return 0;
> +}
> +
> +static void rtl8169_pcs_an_restart(struct phylink_pcs *pcs)
> +{
> +}

[Medium]
pcs_config and pcs_an_restart are both empty stubs. phylink calls
pcs_an_restart from phylink_ethtool_nway_reset(), so an
ethtool -r ethN on an 8116af interface will return success but will
not actually trigger AN restart in hardware. Similarly, pcs_config is
called whenever the inband advertisement changes, so changes via
ethtool -s ... advertise ... are silently dropped.

There does not appear to be any other code path that programs the
SerDes for 1000BASE-X mode or AN advertisement, so the chip seems to
rely on the bootloader leaving the PCS pre-configured.

Should pcs_an_restart at least write BMCR_ANRESTART to the SerDes
BMCR, and pcs_config program the advertisement?

[ ... ]

> @@ -5828,12 +5947,7 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> }
> tp->aspm_manageable = !rc;
>
> - if (rtl_is_8125(tp)) {
> - u16 data = r8168_mac_ocp_read(tp, 0xd006);
> -
> - if ((data & 0xff) == 0x07)
> - tp->sfp_mode = true;
> - }
> + tp->sfp_mode = rtl_get_sfp_mode(tp);

[Low]
After detection as RTL8116af, chip still resolves to the
RTL_GIGA_MAC_VER_52 entry in rtl_chip_infos[] whose name is
"RTL8168fp/RTL8117". The probe banner

netdev_info(dev, "%s, %pM, %sXID %x, IRQ %d\n",
chip->name, ...);

therefore identifies an 8116af device as RTL8168fp/RTL8117. Should
chip->name be overridden when rtl_get_sfp_mode() returns
RTL_SFP_8168_AF so that dmesg reports the actual variant?

[ ... ]

> @@ -5941,10 +6055,12 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> if (rc)
> return rc;
>
> - rc = r8169_mdio_register(tp);
> - if (rc) {
> - phylink_destroy(tp->phylink);
> - return rc;
> + if (tp->sfp_mode != RTL_SFP_8168_AF) {
> + rc = r8169_mdio_register(tp);
> + if (rc) {
> + phylink_destroy(tp->phylink);
> + return rc;
> + }
> }

[High]
r8169_mdio_register() is the only function that initializes
tp->phy_irq_domain via irq_domain_add_linear(). Skipping it for
RTL_SFP_8168_AF leaves tp->phy_irq_domain as NULL.

rtl_set_irq_mask() is unchanged, so the LinkChg bit remains set in
tp->irq_mask. On a SerDes link transition, rtl8169_interrupt() still
runs:

generic_handle_domain_irq(tp->phy_irq_domain, 0);

with tp->phy_irq_domain == NULL. Inside __irq_resolve_mapping() this
falls back to irq_default_domain, which on some architectures will
look up hwirq 0 in an unrelated domain, while on others
handle_irq_desc(NULL) just returns -EINVAL silently.

Since pcs.poll = true is already set for 8168_AF, link state is
polled via phylink anyway. Should LinkChg be masked out of
tp->irq_mask for 8168_AF, or should the generic_handle_domain_irq()
call be guarded by tp->phy_irq_domain (or tp->phydev)?