Re: [PATCH v4 1/2] timer: add a function to adjust timeouts to be upper bound
From: Josh Poimboeuf
Date: Fri Apr 08 2022 - 01:39:48 EST
On Fri, Apr 08, 2022 at 02:37:25AM +0200, Thomas Gleixner wrote:
> "Make sure TCP keepalive timer does not expire late. Switching to upper
> bound timers means it can fire off early but in case of keepalive
> tcp_keepalive_timer() handler checks elapsed time and resets the timer
> if it was triggered early. This results in timer "cascading" to a
> higher precision and being just a couple of milliseconds off it's
> original mark."
>
> Which reinvents the cascading effect of the original timer wheel just
> with more overhead. Where is the justification for this?
>
> Is this really true for all the reasons where the keep alive timers are
> armed? I seriously doubt that. Why?
>
> On the end which waits for the keep alive packet to arrive in time it
> does not matter at all, whether the cutoff is a bit later than defined.
>
> So why do you want to let the timer fire early just to rearm it?
>
> But it matters a lot on the sender side. If that is late and the other
> end is strict about the timeout then you lost. But does it matter
> whether you send the packet too early? No, it does not matter at all
> because the important point is that you send it _before_ the other side
> decides to give up.
>
> So why do you want to let the timer fire precise?
>
> You are solving the sender side problem by introducing a receiver side
> problem and both suffer from the overhead for no reason.
Here are my thoughts. Maybe some networking folks can chime in to
keep us honest.
I get most of what you're saying, though my understanding is that
keepalive is only involved in sending packets, not receiving them. I do
think there would be two opposing use cases:
1) Client sending packets to prevent server disconnects
2) Server sending packets to detect client disconnects
For #1, it's ok for the timer to pop early. For #2, it's ok for it to
pop late. So my conclusion is about the same as your sender/receiver
scenario: there are two sides to the same coin.
If we assume both use cases are valid (which I'm not entirely convinced
of), doesn't that mean that the keepalive timer needs to be precise?
Otherwise we're going to have broken expectations in one direction or
the other, depending on the use case.
> Aside of the theoerical issue why this matters at all I have yet ot see
> a reasonable argument what the practical problen is. If this would be a
> real problem in the wild then why haven't we ssen a reassonable bug
> report within 6 years?
Good question. At least part of the answer *might* be that enterprise
kernels tend to be adopted very slowly. This issue was reported on RHEL
8.3 which is a 4.18 based kernel:
The time that the 1st TCP keepalive probe is sent can be configured by
the "net.ipv4.tcp_keepalive_time" sysctl or by setsockopt().
We observe that if that value is set to 300 seconds, the timer
actually fires around 15-20 seconds later. So ~317 seconds. The larger
the expiration time the greater the delay. So for the default of 2
hours it can be delayed by minutes. This is causing problems for some
customers that rely on the TCP keepalive timer to keep entries active
in firewalls and expect it to be accurate as TCP keepalive values have
to correspond to the firewall settings.
--
Josh