Re: [PATCH] tcp: fix TCP socks unreleased in BBR mode

From: Jason Xing
Date: Tue Jun 02 2020 - 22:42:24 EST


I agree with you. The upstream has already dropped and optimized this
part (commit 864e5c090749), so it would not happen like that. However
the old kernels like LTS still have the problem which causes
large-scale crashes on our thousands of machines after running for a
long while. I will send the fix to the correct tree soon :)

Thanks again,
Jason

On Wed, Jun 3, 2020 at 10:29 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Tue, Jun 2, 2020 at 6:53 PM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote:
> >
> > Hi Eric,
> >
> > I'm sorry that I didn't write enough clearly. We're running the
> > pristine 4.19.125 linux kernel (the latest LTS version) and have been
> > haunted by such an issue. This patch is high-important, I think. So
> > I'm going to resend this email with the [patch 4.19] on the headline
> > and cc Greg.
>
> Yes, please always give for which tree a patch is meant for.
>
> Problem is that your patch is not correct.
> In these old kernels, tcp_internal_pacing() is called _after_ the
> packet has been sent.
> It is too late to 'give up pacing'
>
> The packet should not have been sent if the pacing timer is queued
> (otherwise this means we do not respect pacing)
>
> So the bug should be caught earlier. check where tcp_pacing_check()
> calls are missing.
>
>
>
> >
> >
> > Thanks,
> > Jason
> >
> > On Tue, Jun 2, 2020 at 9:05 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Jun 2, 2020 at 1:05 AM <kerneljasonxing@xxxxxxxxx> wrote:
> > > >
> > > > From: Jason Xing <kerneljasonxing@xxxxxxxxx>
> > > >
> > > > TCP socks cannot be released because of the sock_hold() increasing the
> > > > sk_refcnt in the manner of tcp_internal_pacing() when RTO happens.
> > > > Therefore, this situation could increase the slab memory and then trigger
> > > > the OOM if the machine has beening running for a long time. This issue,
> > > > however, can happen on some machine only running a few days.
> > > >
> > > > We add one exception case to avoid unneeded use of sock_hold if the
> > > > pacing_timer is enqueued.
> > > >
> > > > Reproduce procedure:
> > > > 0) cat /proc/slabinfo | grep TCP
> > > > 1) switch net.ipv4.tcp_congestion_control to bbr
> > > > 2) using wrk tool something like that to send packages
> > > > 3) using tc to increase the delay in the dev to simulate the busy case.
> > > > 4) cat /proc/slabinfo | grep TCP
> > > > 5) kill the wrk command and observe the number of objects and slabs in TCP.
> > > > 6) at last, you could notice that the number would not decrease.
> > > >
> > > > Signed-off-by: Jason Xing <kerneljasonxing@xxxxxxxxx>
> > > > Signed-off-by: liweishi <liweishi@xxxxxxxxxxxx>
> > > > Signed-off-by: Shujin Li <lishujin@xxxxxxxxxxxx>
> > > > ---
> > > > net/ipv4/tcp_output.c | 3 ++-
> > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > > index cc4ba42..5cf63d9 100644
> > > > --- a/net/ipv4/tcp_output.c
> > > > +++ b/net/ipv4/tcp_output.c
> > > > @@ -969,7 +969,8 @@ static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
> > > > u64 len_ns;
> > > > u32 rate;
> > > >
> > > > - if (!tcp_needs_internal_pacing(sk))
> > > > + if (!tcp_needs_internal_pacing(sk) ||
> > > > + hrtimer_is_queued(&tcp_sk(sk)->pacing_timer))
> > > > return;
> > > > rate = sk->sk_pacing_rate;
> > > > if (!rate || rate == ~0U)
> > > > --
> > > > 1.8.3.1
> > > >
> > >
> > > Hi Jason.
> > >
> > > Please do not send patches that do not apply to current upstream trees.
> > >
> > > Instead, backport to your kernels the needed fixes.
> > >
> > > I suspect that you are not using a pristine linux kernel, but some
> > > heavily modified one and something went wrong in your backports.
> > > Do not ask us to spend time finding what went wrong.
> > >
> > > Thank you.