Re: [PATCH] tcp: fix TCP socks unreleased in BBR mode

From: Eric Dumazet
Date: Tue Jun 02 2020 - 22:29:42 EST


On Tue, Jun 2, 2020 at 6:53 PM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote:
>
> Hi Eric,
>
> I'm sorry that I didn't write enough clearly. We're running the
> pristine 4.19.125 linux kernel (the latest LTS version) and have been
> haunted by such an issue. This patch is high-important, I think. So
> I'm going to resend this email with the [patch 4.19] on the headline
> and cc Greg.

Yes, please always give for which tree a patch is meant for.

Problem is that your patch is not correct.
In these old kernels, tcp_internal_pacing() is called _after_ the
packet has been sent.
It is too late to 'give up pacing'

The packet should not have been sent if the pacing timer is queued
(otherwise this means we do not respect pacing)

So the bug should be caught earlier. check where tcp_pacing_check()
calls are missing.



>
>
> Thanks,
> Jason
>
> On Tue, Jun 2, 2020 at 9:05 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > On Tue, Jun 2, 2020 at 1:05 AM <kerneljasonxing@xxxxxxxxx> wrote:
> > >
> > > From: Jason Xing <kerneljasonxing@xxxxxxxxx>
> > >
> > > TCP socks cannot be released because of the sock_hold() increasing the
> > > sk_refcnt in the manner of tcp_internal_pacing() when RTO happens.
> > > Therefore, this situation could increase the slab memory and then trigger
> > > the OOM if the machine has beening running for a long time. This issue,
> > > however, can happen on some machine only running a few days.
> > >
> > > We add one exception case to avoid unneeded use of sock_hold if the
> > > pacing_timer is enqueued.
> > >
> > > Reproduce procedure:
> > > 0) cat /proc/slabinfo | grep TCP
> > > 1) switch net.ipv4.tcp_congestion_control to bbr
> > > 2) using wrk tool something like that to send packages
> > > 3) using tc to increase the delay in the dev to simulate the busy case.
> > > 4) cat /proc/slabinfo | grep TCP
> > > 5) kill the wrk command and observe the number of objects and slabs in TCP.
> > > 6) at last, you could notice that the number would not decrease.
> > >
> > > Signed-off-by: Jason Xing <kerneljasonxing@xxxxxxxxx>
> > > Signed-off-by: liweishi <liweishi@xxxxxxxxxxxx>
> > > Signed-off-by: Shujin Li <lishujin@xxxxxxxxxxxx>
> > > ---
> > > net/ipv4/tcp_output.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index cc4ba42..5cf63d9 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -969,7 +969,8 @@ static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
> > > u64 len_ns;
> > > u32 rate;
> > >
> > > - if (!tcp_needs_internal_pacing(sk))
> > > + if (!tcp_needs_internal_pacing(sk) ||
> > > + hrtimer_is_queued(&tcp_sk(sk)->pacing_timer))
> > > return;
> > > rate = sk->sk_pacing_rate;
> > > if (!rate || rate == ~0U)
> > > --
> > > 1.8.3.1
> > >
> >
> > Hi Jason.
> >
> > Please do not send patches that do not apply to current upstream trees.
> >
> > Instead, backport to your kernels the needed fixes.
> >
> > I suspect that you are not using a pristine linux kernel, but some
> > heavily modified one and something went wrong in your backports.
> > Do not ask us to spend time finding what went wrong.
> >
> > Thank you.