Re: [PATCH net-next v2 4/9] net: dsa: lan9645x: add basic dsa driver for LAN9645X

From: Jens Emil Schulz Ostergaard

Date: Wed Apr 01 2026 - 07:57:21 EST


On Sun, 2026-03-29 at 12:56 -0700, Jakub Kicinski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> Subject: net: dsa: lan9645x: add basic dsa driver for LAN9645X
>
> This commit adds a basic DSA driver for the Microchip LAN9645X switch. It
> includes initialization, port module setup for NPI, CPU, and front ports,
> and phylink integration for MAC side configuration.
>
> diff --git a/drivers/net/dsa/microchip/lan9645x/lan9645x_npi.c b/drivers/net/dsa/microchip/lan9645x/lan9645x_npi.c
> --- a/drivers/net/dsa/microchip/lan9645x/lan9645x_npi.c
> +++ b/drivers/net/dsa/microchip/lan9645x/lan9645x_npi.c
> [ ... ]
> > +static void disable_conduit_ipv6(struct lan9645x *lan9645x,
> > + struct net_device *conduit)
> > +{
> > + struct inet6_dev *dev_v6;
> > +
> > + if (!conduit)
> > + return;
> > +
> > + /* IPv6 on the conduit will generate frames bypassing our tag driver, so
> > + * they lack an IFH. This will be garbage in garbage out and we want to
> > + * avoid this.
> > + */
> > + rtnl_lock();
> > + dev_v6 = __in6_dev_get(conduit);
> > + if (dev_v6) {
> > + WRITE_ONCE(dev_v6->cnf.disable_ipv6, 1);
> > + dev_warn(lan9645x->dev, "Disabled IPv6 on conduit device: %s\n",
> > + netdev_name(conduit));
> > + }
> > + rtnl_lock();
> > +}
>
> Does directly modifying dev_v6->cnf.disable_ipv6 bypass the proper sysctl
> teardown path (addrconf_disable_ipv6) and leave the IPv6 stack in an
> inconsistent state on the conduit device by failing to clean up routes,
> addresses, and MLD memberships?
>
> Is it a layering violation for a switch driver to alter the networking
> configuration of an external upper device, especially since this change
> isn't reverted during driver unbind?
>


Yes this will bypass the sysctl teardown path. I can change this to instead
read disable_ipv6 state and warn the user that their device is misconfigured.

> [ ... ]
> > +void lan9645x_npi_port_init(struct lan9645x *lan9645x,
> > + struct dsa_port *cpu_port)
> > +{
> > + int port = cpu_port->index;
> > + struct lan9645x_port *p;
> > +
> > + p = lan9645x_to_port(lan9645x, port);
> > + lan9645x->npi = port;
>
> In a cascaded multi-switch DSA topology, cpu_port->index may belong to a
> different switch and be greater than 8. If lan9645x_to_port returns NULL,
> will this cause a panic when dereferencing p->chip_port below?
>
> [ ... ]
> > + /* Configure IFH prefix mode for NPI port. */
> > + lan_rmw(SYS_PORT_MODE_INCL_XTR_HDR_SET(LAN9645X_TAG_PREFIX_LONG) |
> > + SYS_PORT_MODE_INCL_INJ_HDR_SET(LAN9645X_TAG_PREFIX_NONE),
> > + SYS_PORT_MODE_INCL_XTR_HDR |
> > + SYS_PORT_MODE_INCL_INJ_HDR,
> > + lan9645x,
> > + SYS_PORT_MODE(p->chip_port));
>
> Setting LAN9645X_TAG_PREFIX_NONE means the hardware interprets the first
> bytes of any frame received from the host as an IFH command structure.
>
> Does this allow a local user to transmit crafted frames (for example, via
> raw sockets or standard protocols) whose MAC address bytes map to malicious
> IFH commands, thereby bypassing VLAN isolation or flooding the switch?
>
> Should the hardware be configured to require a strict prefix, such as
> LAN9645X_TAG_PREFIX_LONG, for injected frames to prevent this?
>

A prefix offers no protection against the described scenario. The current
configuration is correct. If a prefix is configured frames MUST contain the
prefix. With the conduit exposed in userspace this is not something we can
guarantee.


> diff --git a/drivers/net/dsa/microchip/lan9645x/lan9645x_main.c b/drivers/net/dsa/microchip/lan9645x/lan9645x_main.c
> --- a/drivers/net/dsa/microchip/lan9645x/lan9645x_main.c
> +++ b/drivers/net/dsa/microchip/lan9645x/lan9645x_main.c
> [ ... ]
> > +static int lan9645x_setup(struct dsa_switch *ds)
> > +{
> > + struct lan9645x *lan9645x = ds->priv;
> > + struct device *dev = lan9645x->dev;
> > + u32 all_phys_ports;
> > + int err;
> > +
> > + lan9645x->num_phys_ports = ds->num_ports;
> > + all_phys_ports = GENMASK(lan9645x->num_phys_ports - 1, 0);
> > +
> > + err = lan9645x_reset_switch(lan9645x);
> > + if (err)
> > + return err;
> > +
> > + lan9645x->ports = devm_kcalloc(lan9645x->dev, lan9645x->num_phys_ports,
> > + sizeof(struct lan9645x_port *),
> > + GFP_KERNEL);
>
> Since the DSA .setup callback can be invoked multiple times during the
> platform device's lifetime (for example, when the DSA master interface is
> unbound and bound again), does using devm_kcalloc here cause a memory leak?
> Should these allocations be moved to .probe or managed with kcalloc and
> freed in .teardown?
>

Hmm I do not think this is right? I think dsa_tree_teardown is called from
dsa_switch_remove which is called by the switch drivers .remove.

So I do not think you can get multiple setup/teardown cycles during the device
lifetime?

As far as I can tell both felix and ksz use device managed allocations in their
setup callback.

> [ ... ]
> > + /* Set all the entries to obey VLAN. */
> > + for (int i = 0; i < PGID_ENTRIES; ++i)
> > + lan_wr(ANA_PGID_CFG_OBEY_VLAN_SET(1),
> > + lan9645x, ANA_PGID_CFG(i));
>
> PGID_ENTRIES is defined as 89, so this loop initializes indices 0 through
> 88. Since the CPU port is index 9, its source PGID is PGID_SRC + CPU_PORT
> (80 + 9 = 89).
>
> Is index 89 left uninitialized, breaking the OBEY_VLAN rule and allowing
> CPU-injected frames to leak across VLAN boundaries?
>

No I this misunderstands OBEY_VLAN. When set the vlan table can control
whether cpu copy from the pgid table is enabled. It makes no sense for PGID 89.

> [ ... ]
> > + /* Multicast to all front ports */
> > + lan_wr(all_phys_ports, lan9645x, ANA_PGID(PGID_MC));
> > +
> > + /* IP multicast to all front ports */
> > + lan_wr(all_phys_ports, lan9645x, ANA_PGID(PGID_MCIPV4));
> > + lan_wr(all_phys_ports, lan9645x, ANA_PGID(PGID_MCIPV6));
> > +
> > + /* Unicast to all front ports */
> > + lan_wr(all_phys_ports, lan9645x, ANA_PGID(PGID_UC));
> > +
> > + /* Broadcast to all ports */
> > + lan_wr(BIT(CPU_PORT) | all_phys_ports, lan9645x, ANA_PGID(PGID_BC));
>
> PGID_BC includes BIT(CPU_PORT) and all_phys_ports (which includes the NPI
> port). Will this forward broadcast frames to both the CPU extraction queue
> and the NPI port's normal egress queue, causing duplicate frames for the host?
>
> Conversely, the multicast masks and PGID_UC exclude BIT(CPU_PORT). Does
> this cause them to bypass the CPU extraction queue entirely, thereby
> lacking the LONG extraction prefix and breaking the host's DSA tagger parsing?
>

No this is not how it works. Generally when you configure the CPU port to use
an NPI port, the hardware manages this internally. You do you have to start
using the npi port number all of a sudden.

> diff --git a/drivers/net/dsa/microchip/lan9645x/lan9645x_phylink.c b/drivers/net/dsa/microchip/lan9645x/lan9645x_phylink.c
> --- a/drivers/net/dsa/microchip/lan9645x/lan9645x_phylink.c
> +++ b/drivers/net/dsa/microchip/lan9645x/lan9645x_phylink.c
> [ ... ]
> > +static int lan9645x_phylink_mac_prepare(struct phylink_config *config,
> > + unsigned int mode,
> > + phy_interface_t iface)
> > +{
> > + struct lan9645x_port *p = lan9645x_phylink_config_to_port(config);
> > + struct lan9645x *lan9645x = p->lan9645x;
> > + int port = p->chip_port;
> > + u32 mask;
> > +
> > + if (port == 5 || port == 6 || port > 8)
> > + return -EINVAL;
> > +
> > + mask = HSIO_HW_CFG_GMII_ENA_SET(BIT(port));
> > + lan_rmw(mask, mask, lan9645x, HSIO_HW_CFG);
> > +
> > + if (port == 4 && phy_interface_mode_is_rgmii(iface))
> > + lan_rmw(HSIO_HW_CFG_RGMII_0_CFG_SET(1),
> > + HSIO_HW_CFG_RGMII_0_CFG,
> > + lan9645x, HSIO_HW_CFG);
>
> The lan9645x_phylink_get_caps function advertises RGMII support for ports
> 4, 7, and 8. Does restricting the hardware multiplexer configuration here to
> port 4 break connectivity for ports 7 and 8 when used in RGMII mode?
>

Yes. The multiplexing controls which port module RGMII0 should use. It is either port 4 or
port 7. So you can not use RGMII on both port 4 and 7 at the same time.
In theory you could use port 8, but the intended use for this is when QSGMII is enabled. This
will always use port modules 5-8. The multiplexing makes it possible to still use
1 RGMII port in this case.

> diff --git a/drivers/net/dsa/microchip/lan9645x/lan9645x_port.c b/drivers/net/dsa/microchip/lan9645x/lan9645x_port.c
> --- a/drivers/net/dsa/microchip/lan9645x/lan9645x_port.c
> +++ b/drivers/net/dsa/microchip/lan9645x/lan9645x_port.c
> [ ... ]
> > +int lan9645x_port_set_maxlen(struct lan9645x *lan9645x, int port, size_t sdu)
> > +{
> > + struct lan9645x_port *p = lan9645x_to_port(lan9645x, port);
> > + int maxlen = sdu + ETH_HLEN + ETH_FCS_LEN;
>
> Does this maximum frame length calculation drop standard 1500-byte MTU
> frames that are 802.1Q VLAN tagged (1522 bytes total), since it does not
> account for VLAN_HLEN?
>

I was about to say that the VLAN overhead is managed seperately via the DEV_MAC_TAGS_CFG
register.

However, after testing I see there is a bug here. We only configure DEV_MAC_TAGS_CFG
for vlan aware ports. This is a problem in the vlan unaware case.
Furthermore, the NPI port does not get this configuration. I will move the
DEV_MAC_TAGS_CFG configuration to port_setup, so we always allow 2x VLAN tags overhead
in both vlan aware and unaware mode, and towards the CPU.

> [ ... ]
> > +int lan9645x_port_setup(struct dsa_switch *ds, int port)
> > +{
> > + struct dsa_port *dp = dsa_to_port(ds, port);
> > + struct lan9645x *lan9645x = ds->priv;
> > + struct lan9645x_port *p;
> > +
> > + p = lan9645x_to_port(lan9645x, port);
> > +
> > + if (dp->dn) {
> > + p->rx_internal_delay =
> > + of_property_present(dp->dn, "rx-internal-delay-ps");
> > + p->tx_internal_delay =
> > + of_property_present(dp->dn, "tx-internal-delay-ps");
> > + }
>
> These are standard integer properties specifying delays in picoseconds. If
> a user explicitly disables the delay via devicetree using a value of 0,
> will of_property_present evaluate to true and enable the hardware delay
> anyway? Should of_property_read_u32 be used instead to check the value?

A value of 0 is not allowed per the bindings. The bindings enforce that if this
is present the value must be 2000.