Re: [PATCH RFC net-next 00/10] MC Flood disable and snooping

From: Joseph Huang
Date: Thu Apr 04 2024 - 18:16:28 EST


On 4/2/2024 5:59 PM, Nikolay Aleksandrov wrote:
On 4/2/24 23:46, Vladimir Oltean wrote:
On Tue, Apr 02, 2024 at 09:50:51PM +0300, Nikolay Aleksandrov wrote:
On 4/2/24 20:43, Vladimir Oltean wrote:
Hi Nikolai,

On Tue, Apr 02, 2024 at 12:28:38PM +0300, Nikolay Aleksandrov wrote:
For the bridge patches:
Nacked-by: Nikolay Aleksandrov <razor@xxxxxxxxxxxxx>

You cannot break the multicast flood flag to add support for a custom
use-case. This is unacceptable. The current bridge behaviour is correct
your patch 02 doesn't fix anything, you should configure the bridge
properly to avoid all those problems, not break protocols.

Your special use case can easily be solved by a user-space helper or
eBPF and nftables. You can set the mcast flood flag and bypass the
bridge for these packets. I basically said the same in 2021, if this is
going to be in the bridge it should be hidden behind an option that is
default off. But in my opinion adding an option to solve such special
cases is undesirable, they can be easily solved with what's currently
available.

I appreciate your time is limited, but could you please translate your
suggestion, and detail your proposed alternative a bit, for those of us
who are not very familiar with IP multicast snooping?


My suggestion is not related to snooping really, but to the goal of
patches 01-03. The bridge patches in this set are trying to forward
traffic that is not supposed to be forwarded with the proposed
configuration,

Correct up to a point. Reinterpreting the given user space configuration
and trying to make it do something else seems like a mistake, but in
principle one could also look at alternative bridge configurations like
the one I described here:
https://lore.kernel.org/netdev/20240402180805.yhhwj2f52sdc4dl2@skbuf/

so that can be done by a user-space helper that installs
rules to bypass the bridge specifically for those packets while
monitoring the bridge state to implement a policy and manage these rules
in order to keep snooping working.

Bypass the bridge for which packets? General IGMP/MLD queries? Wouldn't
that break snooping? And then do what with the packets, forward them in
another software layer than the bridge?


The ones that are not supposed to be forwarded in the proposed config
and are needed for this use case (control traffic and link-local). Obviously
to have proper snooping you'd need to manage these bypass
rules and use them only while needed.

I think Joseph will end up in a situation where he needs IGMP control
messages both in the bridge data path and outside of it :)


My solution does not exclude such scenario. With all unregistered mcast
disabled it will be handled the same as with this proposed solution.

Also, your proposal eliminates the possibility of cooperating with a
hardware accelerator which can forward the IGMP messages where they need
to go.

As far as I understand, I don't think Joseph has a very "special" use case.
Disabling flooding of unregistered multicast in the data plane sounds
reasonable. There seems to be a gap in the bridge API, in that this

This we already have, but..

operation also affects the control plane, which he is trying to fix with
this "force flooding", because of insufficiently fine grained control.


Right, this is the part that is more special, I'm not saying it's
unreasonable. The proposition to make it optional and break it down to
type of traffic sounds good to me.

I also don't quite understand the suggestion of turning on mcast flooding:
isn't Joseph saying that he wants it off for the unregistered multicast
data traffic?

Ah my bad, I meant to turn off flooding and bypass the bridge for those
packets and ports while necessary, under necessary can be any policy
that the user-space helper wants to implement.

In any case, if this is going to be yet another kernel solution then it
must be a new option that is default off, and doesn't break current mcast
flood flag behaviour.

Yeah, maybe something like this, simple and with clear offload
semantics, as seen in existing hardware (not Marvell though):

mcast_flood == off:
- mcast_ipv4_ctrl_flood: don't care (maybe can force to "off")
- mcast_ipv4_data_flood: don't care
- mcast_ipv6_ctrl_flood: don't care
- mcast_ipv6_data_flood: don't care
- mcast_l2_flood: don't care
mcast_flood == on:
- Flood 224.0.0.x according to mcast_ipv4_ctrl_flood
- Flood all other IPv4 multicast according to mcast_ipv4_data_flood
- Flood ff02::/16 according to mcast_ipv6_ctrl_flood
- Flood all other IPv6 multicast according to mcast_ipv6_data_flood
- Flood L2 according to mcast_l2_flood

Did you mean

if mcast_flood == on (meaning mcast_flood is ENABLED)
- mcast_ipv4_ctrl_flood: don't care (since 224.0.0.x will be flooded anyway)
..

if mcast_flood == off (meaning mcast_flood is DISABLED)
- Flood 224.0.0.x according to mcast_ipv4_ctrl_flood
..

? Otherwise the problem is still not solved when mcast_flood is disabled.


Yep, sounds good to me. I was thinking about something in these lines
as well if doing a kernel solution in order to make it simpler and more
generic. The ctrl flood bits need to be handled more carefully to make
sure they match only control traffic and not link-local data.

Do we consider 224.0.0.251 (mDNS) to be control or data? What qualifies as control I guess that's my question.

I think the old option can be converted to use this fine-grained
control, that is if anyone sets the old flood on/off then the flood
mask gets set properly so we can do just 1 & in the fast path and avoid
adding more tests. It will also make it symmetric - if it can override
the on case, then it will be able to override the off case. And to be
more explicit you can pass a mask variable to br_multicast_rcv() which
will get populated and then you can pass it down to br_flood(). That
will also avoid adding new bits to the skb's bridge CB.