Re: [net-next PATCH 14/19] net: dsa: qca8k: add support for mdb_add/del

From: Vladimir Oltean
Date: Thu Nov 18 2021 - 21:33:45 EST


On Fri, Nov 19, 2021 at 03:19:30AM +0100, Ansuel Smith wrote:
> > > +static int
> > > +qca8k_port_mdb_del(struct dsa_switch *ds, int port,
> > > + const struct switchdev_obj_port_mdb *mdb)
> > > +{
> > > + struct qca8k_priv *priv = ds->priv;
> > > + struct qca8k_fdb fdb = { 0 };
> > > + const u8 *addr = mdb->addr;
> > > + u8 port_mask = BIT(port);
> > > + u16 vid = mdb->vid;
> > > + int ret;
> > > +
> > > + /* Check if entry already exist */
> > > + ret = qca8k_fdb_search(priv, &fdb, addr, vid);
> > > + if (ret < 0)
> > > + return ret;
> > > +
> > > + /* Rule doesn't exist. Why delete? */
> >
> > Because refcounting is hard. In fact with VLANs it is quite possible to
> > delete an absent entry. For MDBs and FDBs, the bridge should now error
> > out before it even reaches to you.
> >
>
> So in this specific case I should simply return 0 to correctly decrement
> the ref, correct?

No, it's fine, don't change anything.

See these?

[ 365.648039] mscc_felix 0000:00:00.5 swp0: failed (err=-2) to del object (id=3)
[ 365.648071] mscc_felix 0000:00:00.5 swp1: failed (err=-2) to del object (id=3)
[ 365.648091] mscc_felix 0000:00:00.5 swp2: failed (err=-2) to del object (id=3)
[ 365.648111] mscc_felix 0000:00:00.5 swp3: failed (err=-2) to del object (id=3)
[ 365.648130] mscc_felix 0000:00:00.5 swp4: failed (err=-2) to del object (id=3)
[68736.079878] mscc_felix 0000:00:00.5 swp0: failed (err=-2) to del object (id=3)
[68736.079912] mscc_felix 0000:00:00.5 swp1: failed (err=-2) to del object (id=3)
[68736.079934] mscc_felix 0000:00:00.5 swp2: failed (err=-2) to del object (id=3)
[68736.079954] mscc_felix 0000:00:00.5 swp3: failed (err=-2) to del object (id=3)
[68736.079974] mscc_felix 0000:00:00.5 swp4: failed (err=-2) to del object (id=3)

err=-2 is -ENOENT, this driver is complaining about the fact that
->port_mdb_del() is called on MDB entries that are no longer in
hardware. And the system isn't doing anything, mind you, just idling.

Long story short, this used to be an issue until commit 3f6e32f92a02
("net: dsa: reference count the FDB addresses at the cross-chip notifier
level") - if you backport anything to v5.10 you'll notice that it'll
complain there, the refcounting is something that appeared in v5.14.

My comment was just to explain "why delete if there's no entry in
hardware" - because there isn't (wasn't) any logic to avoid doing so.

> > > + if (!fdb.aging)
> > > + return -EINVAL;