Re: [PATCH] xprtsock: Fix a loop in xs_tcp_setup_socket()

From: NeilBrown
Date: Sun Apr 21 2024 - 23:45:19 EST


On Mon, 22 Apr 2024, Lex Siegel wrote:
> > Better still would be for kernel_connect() to return a more normal error
> > code - not EPERM. If that cannot be achieved, then I think it would be
> > best for the sunrpc code to map EPERM to something else at the place
> > where kernel_connect() is called - catch it early.
>
> The question is whether a permission error, EPERM, should cause a retry or
> return. Currently xs_tcp_setup_socket() is retrying. For the retry to clear,
> the connect call will have to not return a permission error to halt the retry
> attempts.
>
> This is a default behavior because EPERM is not an explicit case of the switch
> statement. Because bpf appropriately uses EPERM to show that the kernel_connect
> was not permitted, it highlights the return handling for this case is missing.
> It is unlikely that retry was ever the intended result.
>
> Upstream, the bpf that caused this is at:
> https://github.com/cilium/cilium/blob/v1.15/bpf/bpf_sock.c#L336
>
> This cilium bpf code has two return statuses, EPERM and ENXIO, that fall
> through to the default case of retrying. Here, cilium expects both of these
> statuses to indicate the connect failed. A retry is not the intended result.
>
> Handling this case without a retry aligns this code with the udp behavior. This
> precedence for passing EPERM back up the stack was set in 3dedbb5ca10ef.
>
> I will amend my patch to include an explicit case for ENXIO as well, as this is
> also in cilium's bpf and will cause the same bug to occur.
>

I think it should be up to cilium to report an errno that the kernel
understands, not up to the kernel to understand whatever errno cilium
chooses to return.

I don't think EPERM or ENXIO are appropriate errors for network
problems.
EHOSTUNREACH or ECONNREFUSED would make much more sense.

NeilBrown


>
> On Mon, Apr 22, 2024 at 8:22 AM NeilBrown <neilb@xxxxxxx> wrote:
> >
> > On Sat, 20 Apr 2024, Lex Siegel wrote:
> > > When using a bpf on kernel_connect(), the call can return -EPERM.
> > > This causes xs_tcp_setup_socket() to loop forever, filling up the
> > > syslog and causing the kernel to freeze up.
> > >
> > > Signed-off-by: Lex Siegel <usiegl00@xxxxxxxxx>
> > > ---
> > > net/sunrpc/xprtsock.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > index bb9b747d58a1..47b254806a08 100644
> > > --- a/net/sunrpc/xprtsock.c
> > > +++ b/net/sunrpc/xprtsock.c
> > > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> > > /* Happens, for instance, if the user specified a link
> > > * local IPv6 address without a scope-id.
> > > */
> > > + case -EPERM:
> > > + /* Happens, for instance, if a bpf is preventing the connect */
> >
> > This will propagate -EPERM up into other layers which might not be ready
> > to handle it.
> > It might be safer to map EPERM to an error we would be more likely to
> > expect from the network system - such as ECONNREFUSED or ENETDOWN.
> >
> > Better still would be for kernel_connect() to return a more normal error
> > code - not EPERM. If that cannot be achieved, then I think it would be
> > best for the sunrpc code to map EPERM to something else at the place
> > where kernel_connect() is called - catch it early.
> >
> > NeilBrown
> >
> >
> > > case -ECONNREFUSED:
> > > case -ECONNRESET:
> > > case -ENETDOWN:
> > > --
> > > 2.39.3
> > >
> > >
> >
>