RE: [PATCH net-next v02 1/2] af_packet: allow fanout_add when socket is not RUNNING

From: Willem de Bruijn
Date: Fri Oct 11 2024 - 10:24:38 EST


Gur Stavi wrote:
> > Gur Stavi wrote:
> > > > Gur Stavi wrote:
> > > > > > Gur Stavi wrote:
> > > > > > > > Gur Stavi wrote:
> > > > > > > > > >> @@ -1846,21 +1846,21 @@ static int fanout_add(struct
> > sock
> > > > *sk,
> > > > > > > > struct fanout_args *args)
> > > > > > > > > >> err = -EINVAL;
> > > > > > > > > >>
> > > > > > > > > >> spin_lock(&po->bind_lock);
> > > > > > > > > >> - if (packet_sock_flag(po, PACKET_SOCK_RUNNING) &&
> > > > > > > > > >> - match->type == type &&
> > > > > > > > > >> + if (match->type == type &&
> > > > > > > > > >> match->prot_hook.type == po->prot_hook.type &&
> > > > > > > > > >> match->prot_hook.dev == po->prot_hook.dev) {
> > > > > > > > > >
> > > > > > > > > > Remaining unaddressed issue is that the socket can now be
> > > > added
> > > > > > > > > > before being bound. See comment in v1.
> > > > > > > > >
> > > > > > > > > I extended the psock_fanout test with unbound fanout test.
> > > > > > > > >
> > > > > > > > > As far as I understand, the easiest way to verify bind is
> > to
> > > > test
> > > > > > that
> > > > > > > > > po->prot_hook.dev != NULL, since we are under a bind_lock
> > > > anyway.
> > > > > > > > > But perhaps a more readable and direct approach to test
> > "bind"
> > > > > > would be
> > > > > > > > > to test po->ifindex != -1, as ifindex is commented as
> > "bound
> > > > > > device".
> > > > > > > > > However, at the moment ifindex is not initialized to -1, I
> > can
> > > > add
> > > > > > such
> > > > > > > > > initialization, but perhaps I do not fully understand all
> > the
> > > > > > logic.
> > > > > > > > >
> > > > > > > > > Any preferences?
> > > > > > > >
> > > > > > > > prot_hook.dev is not necessarily set if a packet socket is
> > bound.
> > > > > > > > It may be bound to any device. See dev_add_pack and
> > ptype_head.
> > > > > > > >
> > > > > > > > prot_hook.type, on the other hand, must be set if bound and
> > is
> > > > only
> > > > > > > > modified with the bind_lock held too.
> > > > > > > >
> > > > > > > > Well, and in packet_create. But setsockopt PACKET_FANOUT_ADD
> > also
> > > > > > > > succeeds in case bind() was not called explicitly first to
> > bind
> > > > to
> > > > > > > > a specific device or change ptype.
> > > > > > >
> > > > > > > Please clarify the last paragraph? When you say "also succeeds"
> > do
> > > > you
> > > > > > > mean SHOULD succeed or MAY SUCCEED by mistake if "something"
> > > > happens
> > > > > > ???
> > > > > >
> > > > > > I mean it succeeds currently. Which behavior must then be
> > maintained.
> > > > > >
> > > > > > > Do you refer to the following scenario: socket is created with
> > non-
> > > > zero
> > > > > > > protocol and becomes RUNNING "without bind" for all devices. In
> > > > that
> > > > > > case
> > > > > > > it can be added to FANOUT without bind. Is that considered a
> > bug or
> > > > > > does
> > > > > > > the bind requirement for fanout only apply for all-protocol (0)
> > > > > > sockets?
> > > > > >
> > > > > > I'm beginning to think that this bind requirement is not needed.
> > > > >
> > > > > I agree with that. I think that is an historical mistake that
> > socket
> > > > > becomes implicitly bound to all interfaces if a protocol is defined
> > > > > during create. Without this bind requirement would make sense.
> > > > >
> > > > > >
> > > > > > All type and dev are valid, even if an ETH_P_NONE fanout group
> > would
> > > > > > be fairly useless.
> > > > >
> > > > > Fanout is all about RX, I think that refusing fanout for socket
> > that
> > > > > will not receive any packet is OK. The condition can be:
> > > > > if (po->ifindex == -1 || !po->num)
> > > >
> > > > Fanout is not limited to sockets bound to a specific interface.
> > > > This will break existing users.
> > >
> > > For specific interface ifindex >= 1
> > > For "any interface" ifindex == 0
> > > ifindex is -1 only if the socket was created unbound with proto == 0
> > > or for the rare race case that during re-bind the new dev became
> > unlisted.
> > > For both of these cases fanout should fail.
> >
> > The only case where packet_create does not call __register_prot_hook
> > is if proto == 0. If proto is anything else, the socket will be bound,
> > whether to a device hook, or ptype_all. I don't think we need this
> > extra ifindex condition.
> >
>
> Even though "unbound" is an unlikely state for such a socket the code
> Should still address this state consistently. If do_bind sets ifindex
> to -1 on the unlikely unlisted scenario so should packet_create on the
> more likely proto == 0 scenario.

do_bind sets it to -1 for unlisted probably to copy existing behavior
in packet_notifier on NETDEV_DOWN.

But as far as I can tell nothing that uses po->ifindex differentiates
between 0 and -1. It just means that no actual device ifindex matches.

> > > >
> > > > Binding to ETH_P_NONE is useless, but we're not going to slow down
> > > > legitimate users with branches for cases that are harmless.
> > > >
> > >
> > > With "branch", do you refer to performance or something else?
> > > As I said in other mail, ETH_P_NONE could not be used in a fanout
> > > before as well because socket cannot become RUNNING with proto == 0.
> >
> > Good point.
> >
> > > For performance, we removed the RUNNING condition and added this.
> > > It is not like we need to perform 5M fanout registrations/sec. It is a
> > > syscall after all.
> >
> > It's as much about code complexity as performance. Both the patch and
> > resulting code should be as small and self-evident as possible.
> >
> > Patch v3 introduces a lot of code churn.
>
> Did you look at a side by side comparison?

Obviously

> There is really very little
> extra code.

1 file changed, 21 insertions(+), 14 deletions(-)

vs

1 file changed, 5 insertions(+), 5 deletions(-)

for v2.

> >
> > If we don't care about opening up fanout groups to ETH_P_NONE, then
> > patch v2 seems sufficient. If explicitly blocking this, the ENXIO
> > return can be added, but ideally without touching the other lines.
> >
>
> I am not the one to decide if opening it is a good idea but it will be
> ironic if a patch with the intention to remove the only-RUNNING
> restriction will end up allowing never-RUNNING sockets into a fanout
> group.

It's fine to catch that, seem to just be this change?

spin_lock(&po->bind_lock);
- if (po->running &&
+ if (po->num != ETH_P_NONE &&
match->type == type &&
match->prot_hook.type == po->prot_hook.type &&
match->prot_hook.dev == po->prot_hook.dev) {