Re: [PATCH] netfilter: conntrack: tcp: do not lower timeout to CLOSE for in-window RSTs

From: Florian Westphal
Date: Mon Jul 08 2024 - 10:12:39 EST


yyxRoy <yyxroy22@xxxxxxxxx> wrote:
> On Fri, 5 Jul 2024 at 17:43, Florian Westphal <fw@xxxxxxxxx> wrote:
> > Also, one can send train with data packet + rst and we will hit
> > the immediate close conditional:
> >
> > /* Check if rst is part of train, such as
> > * foo:80 > bar:4379: P, 235946583:235946602(19) ack 42
> > * foo:80 > bar:4379: R, 235946602:235946602(0) ack 42
> > */
> > if (ct->proto.tcp.last_index == TCP_ACK_SET &&
> > ct->proto.tcp.last_dir == dir &&
> > seq == ct->proto.tcp.last_end)
> > break;
> >
> > So even if we'd make this change it doesn't prevent remote induced
> > resets.
>
> Thank you for your time and prompt reply and for bringing to my attention the case
> I had overlooked. I acknowledge that as a middlebox, Netfilter faces significant
> challenges in accurately determining the correct sequence and acknowledgment
> numbers. However, it is crucial to consider the security implications as well.

Yes, but we have to make do with the information we have (or we can
observe) and we have to trade this vs. occupancy of the conntrack table.

> For instance, previously, an in-window RST could switch the mapping to the
> CLOSE state with a mere 10-second timeout. The recent patch,
> (netfilter: conntrack: tcp: only close if RST matches exact sequence),
> has aimed to improve security by keeping the mapping in the established state
> and extending the timeout to 300 seconds upon receiving a Challenge ACK.

be0502a3f2e9 ("netfilter: conntrack: tcp: only close if RST matches
exact sequence")?

Yes, that is a side effect. It was about preventing nat mapping from going
away because of RST packet coming from an unrelated previous connection
(Carrier-Grade NAT makes this more likely, unfortunately).

I don't know how to prevent it for RST flooding with known address/port
pairs.

> However, this patch's efforts are still insufficient to completely prevent attacks.
> As I mentioned, attackers can manipulate the TTL to prevent the peer from
> responding to the Challenge ACK, thereby reverting the mapping to the
> 10-second timeout. This duration is quite short and potentially dangerous,
> leading to various attacks, including TCP hijacking (I have included a detailed
> report on potential attacks if time permits).
> else if (unlikely(index == TCP_RST_SET))
> timeout = timeouts[TCP_CONNTRACK_CLOSE];
>
> The problem is that current netfilter only checks if the packet has the RST flag
> (index == TCP_RST_SET) and lowers the timeout to that of CLOSE (10 seconds only).
> I strongly recommend implementing measures to prevent such vulnerabilities.

I don't know how.

We can track TTL/NH.
We can track TCP timestamps.

But how would we use such extra information?
E.g. what I we observe:

ACK, TTL 32
ACK, TTL 31
ACK, TTL 30
ACK, TTL 29

... will we just refuse to update TTL?
If we reduce it, any attacker can shrink it to needed low value
to prevent later RST from reaching end host.

If we don't, connection could get stuck on legit route change?
What about malicious entities injecting FIN/SYN packets rather than RST?

If we have last ts.echo from remote side, we can make it harder, but
what do if RST doesn't carry timestamp?

Could be perfectly legal when machine lost state, e.g. power-cycled.
So we can't ignore such RSTs.

> For example, in the case of an in-window RST, could we consider lowering
> the timeout to 300 seconds or else?

Yes, but I don't see how it helps.
Attacker can prepend data packet and we'd still move to close.

And I don't really want to change that because it helps to get rid
of stale connection with real/normal traffic.

I'm worried that adding cases where we do not act on RSTs will cause
conntrack table to fill up.