Re: [PATCH net] netfilter: nf_nat_masquerade: recalculate TCP TS offset when port is randomized

From: Jiayuan Chen

Date: Tue Jun 30 2026 - 21:44:33 EST



On 6/29/26 5:34 PM, xietangxin wrote:
Problem observed in Kubernetes environments where MASQUERADE target with
--random-fully is configured by default. after commit
165573e41f2f ("tcp: secure_seq: add back ports to TS offset") TCP short
connection QPS dropped from ~20000 to ~10000. This added source and
destination ports into TS offset calculation.

However, with MASQUERADE --random-fully, when multiple internal connections
(e.g sport 10000,20000) are mapped to the same external port (e.g 30000),
their TS offsets are calculated as ts_offset(10000) and ts_offset(20000).
If the server reuses the TIME_WAIT slot from the first connection, there is
a chance that ts_offset(20000) < ts_offset(10000), breaking TSval
monotonicity for the same 4-tuple and causing RST packets:
Client -> Server 24870 -> 80 [SYN] TSval=2294041168
Server -> Client 80 -> 24870 [ACK] TSecr=2846236456
Client -> Server 24870 -> 80 [RST] Seq=855605690

After nf_nat_setup_info() successfully assigns a new randomized
source port, recalculate the TS offset using the new port and
update the SYN packet's TSval accordingly.

Test results on 4U4G VM with
`./wrk -t8 -c200 -H "Connection: close" -d10s --latency http://5.5.5.5:80`
Before:
random:10712 req/s, random-fully:10986 req/s
After:
random:21463 req/s, random-fully:19181 req/s

Fixes: 165573e41f2f ("tcp: secure_seq: add back ports to TS offset")
Cc: stable@xxxxxxxxxxxxxxx


I'd treat it as a feature not a fix.


Closes:https://lore.kernel.org/all/92935c00-e0be-4591-ac44-5978c7804d57@xxxxxxxx/
Signed-off-by: xietangxin <xietangxin@xxxxxxxxxxxxxx>
---
net/netfilter/nf_nat_masquerade.c | 91 ++++++++++++++++++++++++++++++-
1 file changed, 89 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_nat_masquerade.c b/net/netfilter/nf_nat_masquerade.c
index 4de6e0a51701..8c9ca5a051cc 100644
--- a/net/netfilter/nf_nat_masquerade.c
+++ b/net/netfilter/nf_nat_masquerade.c
@@ -6,8 +6,11 @@
#include <linux/netfilter.h>
#include <linux/netfilter_ipv4.h>
#include <linux/netfilter_ipv6.h>
+#include <linux/tcp.h>
+#include <net/tcp.h>
#include <net/netfilter/nf_nat_masquerade.h>
+#include <net/secure_seq.h>
struct masq_dev_work {
struct work_struct work;
@@ -24,6 +27,76 @@ static DEFINE_MUTEX(masq_mutex);
static unsigned int masq_refcnt __read_mostly;
static atomic_t masq_worker_count __read_mostly;
+static __be32 *tcp_ts_option_ptr(const struct sk_buff *skb)
+{
+ const struct tcphdr *th;
+ unsigned char *ptr;
+ unsigned char opsize;
+ unsigned int optlen, offset;
+
+ th = tcp_hdr(skb);
+ optlen = (th->doff - 5) * 4;
+ ptr = (unsigned char *)(th + 1);
+ offset = 0;
+
+ while (offset < optlen) {
+ unsigned char opcode = ptr[offset];
+
+ if (opcode == TCPOPT_EOL)
+ break;
+ if (opcode == TCPOPT_NOP) {
+ offset++;
+ continue;
+ }
+
+ if (offset + 1 >= optlen)
+ break;
+
+ opsize = ptr[offset + 1];
+ if (opsize < 2 || offset + opsize > optlen)
+ break;
+
+ if (opcode == TCPOPT_TIMESTAMP && opsize == TCPOLEN_TIMESTAMP)
+ return (__be32 *)(ptr + offset + 2);
+
+ offset += opsize;
+ }
+
+ return NULL;
+}
+
+static void masquerade_update_tcp_ts_offset(struct nf_conn *ct, struct sk_buff *skb)
+{
+ __be32 *tsptr;
+ struct net *net;
+ struct tcphdr *th;
+ struct tcp_sock *tp;
+ union tcp_seq_and_ts_off st;
+ struct nf_conntrack_tuple *tuple;
+
+ th = tcp_hdr(skb);
+ net = nf_ct_net(ct);
+ tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+

why use reply not original, or do I miss something ?