On tor, aug 08, 2024 at 12:48, Chris Packham <chris.packham@xxxxxxxxxxxxxxxxxxx> wrote:
Hi,In the simple case when a switchport is directly attached to a bridge,
I'm trying to get to grips with how the switchdev notifications are
supposed to be used when developing a switchdev driver.
I have been reading through
https://www.kernel.org/doc/html/latest/networking/switchdev.html which
covers a few things but doesn't go into detail around the notifiers that
one needs to implement for a new switchdev driver (which is probably
very dependent on what the hardware is capable of).
Specifically right now I'm looking at having a switch port join a vlan
aware bridge. I have a configuration something like this
ip link add br0 type bridge vlan_filtering 1
ip link set sw1p5 master br0
ip link set sw1p1 master br0
bridge vlan add vid 2 dev br0 self
ip link add link br0 br0.2 type vlan id 2
ip addr add dev br0.2 192.168.2.1/24
bridge vlan add vid 2 dev lan5 pvid untagged
bridge vlan add vid 2 dev lan1
ip link set sw1p5 up
ip link set sw1p1 up
ip link set br0 up
ip link set br0.2 up
Then I'm testing by sending a ping to a nonexistent host on the
192.168.2.0/24 subnet and looking at the traffic with tcpdump on another
device connected to sw1p5.
I'm a bit confused about how I should be calling
switchdev_bridge_port_offload(). It takes two netdevs (brport_dev and
dev) but as far as I've been able to see all the callers end up passing
the same netdev for both of these (some create a driver specific brport
but this still ends up with brport->dev and dev being the same object).
brport_dev and dev will be the same. If the attachment is indirect, via
a bond for example, they will differ:
br0
/
bond0
/ \
sw1p1 sw1p5
In the setup above, the bridge has no reference to any sw*p* interfaces,
all generated notifications will reference "bond0". By including the
switchdev port in the message back to the bridge, it can perform
validation on the setup; e.g. that bond0 is not made up of interfaces
from different hardware domains.
I've figured out that I need to set tx_fwd_offload=true so that theSignaling tx_fwd_offload=true means assuming responsibility for
bridge software only sends one packet to the hardware. That makes sense
as a way of saying the my hardware can take care of sending the packet
out the right ports.
I do have a problem that what I get from the bridge has a vlan tag
inserted (which makes sense in sw when the packet goes from br0.2 to
br0). But I don't actually need it as the hardware will insert a tag for
me if the port is setup for egress tagging. I can shuffle the Ethernet
header up but I was wondering if there was a way of telling the bridge
not to insert the tag?
delivering each packet to all ports that the bridge would otherwise have
sent individual skbs for.
Let's expand your setup slightly, and see why you need the tag:
br0.2 br0.3
\ /
br0
/ | \
/ | \
sw1p1 sw1p3 sw1p5
(2U) (3U) (2T,3T)
sw1p5 is now a trunk. We can trigger an ARP broadcast to be sent out
either via br0.2 or br0.3, depending on the subnet we choose to target.
Your driver will receive a single skb to transmit, and skb->dev can be
set to any of sw1p{1,3,5} depending on config order, FDB entries
(i.e. the order of previously received packets) etc., and is thus
nondeterministic.
So presumably, even though you might need to remove the 802.1Q tag from
the frame, you need some way of tagging the packet with the correct VID
in order for the hardware to do the right thing; possibly via a field in
the vendor's hardware specific tag.
Finally I'm confused about the atomic_nb/atomic_nb parameters. SomeBecause when you add a port to the bridge, lots of stuff that you want
drivers just pass NULL and others pass the same notifier blocks that
they've already registered with
register_switchdev_notifier()/register_switchdev_notifier(). If
notifiers are registered why does switchdev_bridge_port_offload() take
them as parameters?
to offload might already have been configured. E.g., imagine that you
were to add vlan 2 to br0 before adding the switchports; then you
probably need those events to be replayed to the new ports in order to
add your CPU-facing switchport to vlan 2. However, we do not want to
bother existing bridge members with duplicated events (and risk messing
up any reference counters they might maintain for these
objects). Therefore we bypass the standard notifier calls and "unicast"
the replay events only to the driver for the port being added.
Thanks,
Chris