[PATCH] net: bridge: temporarily drop br->lock for br_switchdev_set_port_flag in sysfs

From: Vladimir Oltean
Date: Tue Feb 09 2021 - 12:43:40 EST


Since we would like br_switchdev_set_port_flag to not use an atomic
notifier, it should be called from outside spinlock context.

Dropping the lock creates some concurrency complications:
- There might be an "echo 1 > multicast_flood" simultaneous with an
"echo 0 > multicast_flood". The result of this is nondeterministic
either way, so I'm not too concerned as long as the result is
consistent (no other flags have changed).
- There might be an "echo 1 > multicast_flood" simultaneous with an
"echo 0 > learning". My expectation is that none of the two writes are
"eaten", and the final flags contain BR_MCAST_FLOOD=1 and BR_LEARNING=0
regardless of the order of execution. That is actually possible if, on
the commit path, we don't do a trivial "p->flags = flags" which might
overwrite bits outside of our mask, but instead we just change the
flags corresponding to our mask.

Signed-off-by: Vladimir Oltean <vladimir.oltean@xxxxxxx>
---
net/bridge/br_sysfs_if.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 62540b31e356..b419d9aad548 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -68,17 +68,23 @@ static int store_flag(struct net_bridge_port *p, unsigned long v,
else
flags &= ~mask;

- if (flags != p->flags) {
- err = br_switchdev_set_port_flag(p, flags, mask, &extack);
- if (err) {
- if (extack._msg)
- netdev_err(p->dev, "%s\n", extack._msg);
- return err;
- }
+ if (flags == p->flags)
+ return 0;

- p->flags = flags;
- br_port_flags_change(p, mask);
+ spin_unlock_bh(&p->br->lock);
+ err = br_switchdev_set_port_flag(p, flags, mask, &extack);
+ spin_lock_bh(&p->br->lock);
+ if (err) {
+ if (extack._msg)
+ netdev_err(p->dev, "%s\n", extack._msg);
+ return err;
}
+
+ p->flags &= ~mask;
+ p->flags |= (flags & mask);
+
+ br_port_flags_change(p, mask);
+
return 0;
}

-----------------------------[cut here]-----------------------------

I figured there's a similar problem in this patch, which I had missed.
The code now looks like this:

changed_mask = old_flags ^ p->flags;
flags = p->flags;

spin_unlock_bh(&p->br->lock);

err = br_switchdev_set_port_flag(p, flags, changed_mask, extack);
if (err) {
spin_lock_bh(&p->br->lock);
p->flags &= ~changed_mask;
p->flags |= (old_flags & changed_mask);
spin_unlock_bh(&p->br->lock);
return err;
}

spin_lock_bh(&p->br->lock);

where I no longer access p->flags directly when calling
br_switchdev_set_port_flag (because I'm not protected by br->lock) but a
copy of it saved on stack. Also, I restore just the mask portion of
p->flags.

But there's an interesting side effect of allowing
br_switchdev_set_port_flag to run concurrently (notifier call chains use
a rw_semaphore and only take the read side). Basically now drivers that
cache the brport flags in their entirety are broken, because there isn't
any guarantee that bits outside the mask are valid any longer (we can
even enforce that by masking the flags with the mask when notifying
them). They would need to do the same trick of updating just the masked
part of their cached flags. Except for the fact that they would need
some sort of spinlock too, I don't think that the basic bitwise
operations are atomic or anything like that. I'm a bit reluctant to add
a spinlock in prestera, rocker, mlxsw just for this purpose. What do you
think?