Re: Re: [PATCH] tcp: Add an extra check for consecutive failed keepalive probes

From: Neal Cardwell
Date: Fri Jan 10 2025 - 11:30:57 EST


On Fri, Jan 10, 2025 at 11:22 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
>
>
> On Fri, Jan 10, 2025 at 4:58 PM lizhe <sensor1010@xxxxxxx> wrote:
> >
> > Hi, Neal
> >
> >
> > If the TCP_USER_TIMEOUT option is not enabled, and attempts to send TCP keepalive probes continuously fail,
> >
> > then who limits the number of increments to icsk->icsk_probes_out?
> >
> >
> > Adding this code is feasible. If not added, the system would continuously send keepalive probes without any limit.
> >
> > If these probes continually fail, the process would persist indefinitely because there would be no measure in place to restrict the increments of icsk->icsk_probes_out++.
> >
> >
>
> I think you should provide a packetdrill test, as Neal suggested.
>
> If you write a packetdrill test, chances are very high you will see the code is currently fine.

Yes, indeed. :-)

AFAICT we don't even need a new packetdrill test... I provided a
"repeated keepalive failures" packetdrill test earlier in this thread.
If folks run that test, they should find that recent Linux kernels
correctly terminate a connection after the configured number of
keepalive probes fail.

If there is some corner case we are missing, then you might want to
start from that packetdrill test I pasted in, and try to demonstrate
your corner case that you think the code is missing.

thanks,
neal

> In any case, you have to provide a Fixes: tag for any bug fix for networking code,
> as explained in Documentation/process/maintainer-netdev.rst
>
> Thank you.
>
> >
> > _Lizhe,
> >
> > thx
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2025-01-10 00:31:55, "Neal Cardwell" <ncardwell@xxxxxxxxxx> wrote:
> > >On Thu, Jan 9, 2025 at 11:21 AM Lizhe <sensor1010@xxxxxxx> wrote:
> > >>
> > >> Add an additional check to handle situations where consecutive
> > >> keepalive probe packets are sent without receiving a response.
> > >>
> > >> Signed-off-by: Lizhe <sensor1010@xxxxxxx>
> > >> ---
> > >> net/ipv4/tcp_timer.c | 6 ++++++
> > >> 1 file changed, 6 insertions(+)
> > >>
> > >> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> > >> index b412ed88ccd9..5a5dee8cd6d3 100644
> > >> --- a/net/ipv4/tcp_timer.c
> > >> +++ b/net/ipv4/tcp_timer.c
> > >> @@ -828,6 +828,12 @@ static void tcp_keepalive_timer (struct timer_list *t)
> > >> }
> > >> if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
> > >> icsk->icsk_probes_out++;
> > >> + if (icsk->icsk_probes_out >= keepalive_probes(tp)) {
> > >> + tcp_send_active_reset(sk, GFP_ATOMIC,
> > >> + SK_RST_REASON_TCP_KEEPALIVE_TIMEOUT);
> > >> + tcp_write_err(sk);
> > >> + goto out;
> > >> + }
> > >> elapsed = keepalive_intvl_when(tp);
> > >> } else {
> > >> /* If keepalive was lost due to local congestion,
> > >> --
> > >
> > >Can you please explain the exact motivation for your patch, ideally
> > >providing either a tcpdump trace or packetdrill test to document the
> > >scenario you are concerned about?
> > >
> > >The Linux TCP keepalive logic in tcp_keepalive_timer() already
> > >includes logic (a few lines above the spot you propose to patch) that
> > >ensures that a connection will be closed with ETIMEDOUT if consecutive
> > >keepalive probes fail:
> > >
> > > if ((user_timeout != 0 &&
> > > elapsed >= msecs_to_jiffies(user_timeout) &&
> > > icsk->icsk_probes_out > 0) ||
> > > (user_timeout == 0 &&
> > > icsk->icsk_probes_out >= keepalive_probes(tp))) {
> > > tcp_send_active_reset(sk, GFP_ATOMIC,
> > >
> > >SK_RST_REASON_TCP_KEEPALIVE_TIMEOUT);
> > > tcp_write_err(sk);
> > > goto out;
> > > }
> > > if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
> > > icsk->icsk_probes_out++;
> > > elapsed = keepalive_intvl_when(tp);
> > >
> > >AFAICT your patch nearly duplicates the existing logic, but changes
> > >the application-visible behavior to close the connection after one
> > >fewer timer expiration, thus breaking the semantics of the
> > >net.ipv4.tcp_keepalive_probes.
> > >
> > >neal
> > >
> > >---
> > >
> > >ps: For reference, here is a packetdrill test we use to test this
> > >functionality; this passes on recent Linux kernels:
> > >
> > >// Test TCP keepalive behavior without TCP timestamps enabled.
> > >
> > >`../common/defaults.sh`
> > >
> > >// Create a socket.
> > > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> > > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> > >
> > > +0 bind(3, ..., ...) = 0
> > > +0 listen(3, 1) = 0
> > >
> > >// Establish a connection.
> > > +0 < S 0:0(0) win 20000 <mss 1000,nop,nop,sackOK,nop,wscale 8>
> > > +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
> > > +.1 < . 1:1(0) ack 1 win 20000
> > > +0 accept(3, ..., ...) = 4
> > >
> > >// Verify keepalives are disabled by default.
> > > +0 getsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [0], [4]) = 0
> > >// Enable keepalives:
> > > +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
> > >
> > >// Verify default TCP_KEEPIDLE is 7200, from net.ipv4.tcp_keepalive_time=7200:
> > > +0 getsockopt(4, SOL_TCP, TCP_KEEPIDLE, [7200], [4]) = 0
> > >// Start sending keepalive probes after 3 seconds of idle
> > > +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [3], 4) = 0
> > >
> > >// Verify default TCP_KEEPINTVL is 75, from net.ipv4.tcp_keepalive_intvl=75:
> > > +0 getsockopt(4, SOL_TCP, TCP_KEEPINTVL, [75], [4]) = 0
> > >// Send keepalive probes every 2 seconds.
> > > +0 setsockopt(4, SOL_TCP, TCP_KEEPINTVL, [2], 4) = 0
> > >
> > >// Verify default TCP_KEEPCNT is 9, from net.ipv4.tcp_keepalive_probes=9:
> > > +0 getsockopt(4, SOL_TCP, TCP_KEEPCNT, [9], [4]) = 0
> > >// Send 4 keepalive probes before giving up.
> > > +0 setsockopt(4, SOL_TCP, TCP_KEEPCNT, [4], 4) = 0
> > >
> > >// Set up an epoll operation to verify that connections terminated by failed
> > >// keepalives will wake up blocked epoll waiters with EPOLLERR|EPOLLHUP:
> > > +0 epoll_create(1) = 5
> > > +0 epoll_ctl(5, EPOLL_CTL_ADD, 4, {events=EPOLLERR, fd=4}) = 0
> > > +0...11 epoll_wait(5, {events=EPOLLERR|EPOLLHUP, fd=4}, 1, 15000) = 1
> > >
> > >// Verify keepalive behavior looks correct, given the parameters above:
> > >
> > >// Start sending keepalive probes after 3 seconds of idle.
> > > +3 > . 0:0(0) ack 1
> > >// Send keepalive probes every 2 seconds.
> > > +2 > . 0:0(0) ack 1
> > > +2 > . 0:0(0) ack 1
> > > +2 > . 0:0(0) ack 1
> > > +2 > R. 1:1(0) ack 1
> > >// Sent 4 keepalive probes and then gave up and reset the connection.
> > >
> > >// Verify that we get the expected error when we try to use the socket:
> > > +0 read(4, ..., 1000) = -1 ETIMEDOUT (Connection timed out)