Re: [PATCH RFC net-next 04/10] net: dsa: mv88e6xxx: Add all hosts mc addr to ATU
From: Vladimir Oltean
Date: Tue Apr 02 2024 - 14:08:24 EST
On Mon, Apr 01, 2024 at 08:11:03PM -0400, Joseph Huang wrote:
> Add local network all hosts multicast address (224.0.0.1) and link-local
> all nodes multicast address (ff02::1) to the ATU so that IGMP/MLD
> Queries can be forwarded even when multicast flooding is disabled on a
> port.
>
> Signed-off-by: Joseph Huang <Joseph.Huang@xxxxxxxxxx>
> ---
> drivers/net/dsa/mv88e6xxx/Kconfig | 12 ++++++++
> drivers/net/dsa/mv88e6xxx/chip.c | 47 +++++++++++++++++++++++++++++++
> 2 files changed, 59 insertions(+)
>
> diff --git a/drivers/net/dsa/mv88e6xxx/Kconfig b/drivers/net/dsa/mv88e6xxx/Kconfig
> index e3181d5471df..ef7798bf50d7 100644
> --- a/drivers/net/dsa/mv88e6xxx/Kconfig
> +++ b/drivers/net/dsa/mv88e6xxx/Kconfig
> @@ -17,3 +17,15 @@ config NET_DSA_MV88E6XXX_PTP
> help
> Say Y to enable PTP hardware timestamping on Marvell 88E6xxx switch
> chips that support it.
> +
> +config NET_DSA_MV88E6XXX_ALWAYS_FLOOD_LOCAL_ALL_HOSTS_ADDRESS
> + bool "Always flood local all hosts multicast packets"
> + depends on NET_DSA_MV88E6XXX
> + help
> + When set to Y, always flood multicast packets destined for
> + 224.0.0.1 (Local Network All Hosts multicast address) and
> + ff02::1 (Link-Local All Nodes multicast address), even when
> + multicast flooding is disabled for a port. This is so that
> + multicast snooping can continue to function even when
> + multicast flooding is disabled.
> + If in doubt, say N.
In its current form, this will never be accepted. The mainline Linux
kernel is not a purpose-built project like a bootloader (and also like
some other uses of the Linux kernel). It gets picked up by
distributions, and the same kernel image is supposed to be used on
multiple platforms. So, customizing behavior at compilation time is not
a viable option. If there is any behavior that needs to be different on
some platforms than on others for some reason (this needs a
justification in its own right), it is handled through dedicated user
space API. When others say "hide custom behavior X behind an option", a
compile-time option is not what they mean.
As for the substance of the change itself, I am far from an authority
on multicast, I think you should try to push for something else, which
should be more palatable for everybody.
Some switches I've been working with have explicit flooding controls for:
- L2 multicast
- IPv4 control multicast (224.0.0.x)
- IPv4 data multicast
- IPv6 control multicast (FF02::/16)
- IPv6 data multicast
Whereas the bridge has a single "mcast_flood" control.
It seems adequate to attempt adding more netlink attributes to control
all of the above, individually. Then you could describe your use case,
in a standard way to the kernel, as "ip link set swp0 type bridge_slave
mcast_ipv4_data_flood off mcast_ipv4_ctrl_flood on". For compatibility,
"mcast_flood" could still be interpreted as a global enable for all
multicast.
The trouble seems to be actually offloading this config to Marvell DSA,
they don't seem to have a classifier that distinguishes between kinds of
multicast traffic.
How many link-local IPv4 and IPv6 addresses are there in actual use?
Would it be feasible to actually add ATU addresses for all of them, like
you did for 224.0.0.1 and ff02::1, and say that the port destinations of
_those_ are the mcast_ipv4_ctrl_flood ports?
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 9ed1821184ec..fddcb596c421 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -2488,6 +2488,41 @@ static int mv88e6xxx_broadcast_setup(struct mv88e6xxx_chip *chip, u16 vid)
> return 0;
> }
>
> +#if IS_ENABLED(CONFIG_NET_DSA_MV88E6XXX_ALWAYS_FLOOD_LOCAL_ALL_HOSTS_ADDRESS)
> +static int mv88e6xxx_port_add_multicast(struct mv88e6xxx_chip *chip, int port,
> + u16 vid)
> +{
> + u8 state = MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC;
> + const char multicast[][ETH_ALEN] = {
> + { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x01 },
> + { 0x33, 0x33, 0x00, 0x00, 0x00, 0x01 }
> + };
> + int i, err;
> +
> + for (i = 0; i < ARRAY_SIZE(multicast); i++) {
> + err = mv88e6xxx_port_db_load_purge(chip, port, multicast[i], vid, state);
> + if (err)
> + return err;
> + }
> +
> + return 0;
> +}
> +
> +static int mv88e6xxx_multicast_setup(struct mv88e6xxx_chip *chip, u16 vid)
> +{
> + int port;
> + int err;
> +
> + for (port = 0; port < mv88e6xxx_num_ports(chip); port++) {
> + err = mv88e6xxx_port_add_multicast(chip, port, vid);
> + if (err)
> + return err;
> + }
> +
> + return 0;
> +}
> +#endif
> +
> struct mv88e6xxx_port_broadcast_sync_ctx {
> int port;
> bool flood;
> @@ -2572,6 +2607,12 @@ static int mv88e6xxx_port_vlan_join(struct mv88e6xxx_chip *chip, int port,
> err = mv88e6xxx_broadcast_setup(chip, vlan.vid);
> if (err)
> return err;
> +
> +#if IS_ENABLED(CONFIG_NET_DSA_MV88E6XXX_ALWAYS_FLOOD_LOCAL_ALL_HOSTS_ADDRESS)
> + err = mv88e6xxx_multicast_setup(chip, vlan.vid);
> + if (err)
> + return err;
> +#endif
> } else if (vlan.member[port] != member) {
> vlan.member[port] = member;
>
> @@ -3930,6 +3971,12 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
> if (err)
> goto unlock;
>
> +#if IS_ENABLED(CONFIG_NET_DSA_MV88E6XXX_ALWAYS_FLOOD_LOCAL_ALL_HOSTS_ADDRESS)
> + err = mv88e6xxx_multicast_setup(chip, 0);
This is the danger of developing on an old kernel and then just blindly
forward-porting. It will compile and smile and look pretty, but it won't
work. You need to request your board provider to do something and give
you access to mainline code.
In newer kernels, VID 0 is MV88E6XXX_VID_STANDALONE, aka a completely
separate address database compared to what the bridge is in charge of.
So, applied to the mainline kernel as of today, your change does nothing
useful, because when under a VLAN-unaware bridge, packets now get
classified to MV88E6XXX_VID_BRIDGED (4095).
For that matter, the port database (MV88E6XXX_VID_STANDALONE) should be
controlled through dev_mc_add(), and "ip maddr" shows you the link-local
multicast addresses offloaded to the device's RX filter.
mv88e6xxx today does not pass the requirements for dsa_switch_supports_mc_filtering(),
so dev_mc_add() does not actually program anything into hardware, but if
it did, it would have been the port database (VID 0), and this is what
makes your change wrong. You can read more about address databases in
Documentation/networking/dsa/dsa.rst.
> + if (err)
> + goto unlock;
> +#endif
> +
> err = mv88e6xxx_pot_setup(chip);
> if (err)
> goto unlock;
> --
> 2.17.1
>