Re: [PATCH net-next v2 05/15] tcp: allow mptcp to drop TS for some packets

From: Paolo Abeni

Date: Thu Jun 11 2026 - 03:46:36 EST


Hi Eric,

On 6/5/26 11:21 AM, Matthieu Baerts (NGI0) wrote:
> With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
> taking 30 bytes, the 40-byte limit for the TCP options is reached. In
> this case, it is then not possible to send the address signal.
>
> The idea is to let MPTCP dropping the TCP-timestamps option for some
> specific packets, to be able to send some specific pure ACK carrying >28
> bytes of MPTCP options, like with this specific ADD_ADDR. A new
> parameter is passed from tcp_established_options to the MPTCP side to
> indicate if the TCP TS option is used, and if it should be dropped. The
> next commit implements the part on MPTCP side, but split into two
> patches to help TCP maintainers to identify the modifications on TCP
> side. This feature will be controlled by a new add_addr_v6_port_drop_ts
> MPTCP sysctl knob.
>
> It is important to keep in mind that dropping the TCP timestamps option
> for one packet of the connection could eventually disrupt some
> middleboxes: even if it should be unlikely, they could drop the packet
> or even block the connection. That's why this new feature will be
> controlled by a sysctl knob.
>
> Note that it would be technically possible to squeeze both options into
> the header if the ADD_ADDR is first written, and then the TCP timestamps
> without the NOPs preceding it. But this means more modifications on TCP
> side, plus some middleboxes could still be disrupted by that.
>
> In this implementation, an unused bit is used in mptcp_out_options
> structure to avoid passing an address to a local variable. Reading and
> setting it needs CONFIG_MPTCP, so the whole block now has this #if
> condition: mptcp_established_options() is then no longer used without
> CONFIG_MPTCP.
>
> About alternatives, instead of passing a new boolean (has_ts), another
> option would be to pass the whole option structure (opts), but
> 'struct tcp_out_options' is currently defined in tcp_output.c, and it
> would need to be exported. Plus that means the removal of the TCP TS
> option would be done on the MPTCP side, and not here on the TCP side.
> It feels clearer to remove other TCP options from the TCP side, than
> hiding that from the MPTCP side.
>
> Yet an other alternative would be to pass the size already taken by the
> other TCP options, and have a way to drop them all when needed. But this
> feels better to target only the timestamps option where dropping it
> should be safe, even if it is currently the only option that would be
> set before MPTCP, when MPTCP is used.
>
> Reviewed-by: Mat Martineau <martineau@xxxxxxxxxx>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@xxxxxxxxxx>
> ---
> - v2: Avoid passing local variables' addresses to
> mptcp_established_options not to force the compiler to use a stack
> canary in this hot function, even for non-MPTCP flows. (Eric Dumazet)
> To: Neal Cardwell <ncardwell@xxxxxxxxxx>
> To: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx>
> ---
> include/net/mptcp.h | 13 +++----------
> net/ipv4/tcp_output.c | 10 +++++++++-
> net/mptcp/options.c | 2 +-
> 3 files changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> index 24d1016a4664..71b9fc5a5796 100644
> --- a/include/net/mptcp.h
> +++ b/include/net/mptcp.h
> @@ -72,7 +72,8 @@ struct mptcp_out_options {
> u8 reset_reason:4,
> reset_transient:1,
> csum_reqd:1,
> - allow_join_id0:1;
> + allow_join_id0:1,
> + drop_ts:1;
> union {
> struct {
> u64 sndr_key;
> @@ -153,7 +154,7 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
> bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
> struct mptcp_out_options *opts);
> int mptcp_established_options(struct sock *sk, struct sk_buff *skb,
> - unsigned int remaining,
> + unsigned int remaining, bool has_ts,
> struct mptcp_out_options *opts);
> bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
>
> @@ -269,14 +270,6 @@ static inline bool mptcp_synack_options(const struct request_sock *req,
> return false;
> }
>
> -static inline int mptcp_established_options(struct sock *sk,
> - struct sk_buff *skb,
> - unsigned int remaining,
> - struct mptcp_out_options *opts)
> -{
> - return -1;
> -}
> -
> static inline bool mptcp_incoming_options(struct sock *sk,
> struct sk_buff *skb)
> {
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index d3b8e61d3c5e..26dd751ec72a 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1175,6 +1175,7 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
> size += TCPOLEN_TSTAMP_ALIGNED;
> }
>
> +#if IS_ENABLED(CONFIG_MPTCP)
> /* MPTCP options have precedence over SACK for the limited TCP
> * option space because a MPTCP connection would be forced to
> * fall back to regular TCP if a required multipath option is
> @@ -1183,15 +1184,22 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
> */
> if (sk_is_mptcp(sk)) {
> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
> + bool has_ts = opts->options & OPTION_TS;
> int opt_size;
>
> - opt_size = mptcp_established_options(sk, skb, remaining,
> + opts->mptcp.drop_ts = 0;
> +
> + opt_size = mptcp_established_options(sk, skb, remaining, has_ts,
> &opts->mptcp);
> if (opt_size >= 0) {
> opts->options |= OPTION_MPTCP;
> size += opt_size;
> +
> + if (opts->mptcp.drop_ts)
> + opts->options &= ~OPTION_TS;

I'm wondering if you are ok with this patch in the current form?

One thing that was discussed on the mptcp ML was exposing the
tcp_out_options layout so that mptcp_established_options() could receive
such argument and likely clean-up a bit this code.

Not done here because placing tcp_out_options under `include` felt a bit
"too much". Perhaps adding a `net/ipv4/tcp_option.h` header (and
including it from the mptcp code) would be more palatable?

Thanks,

Paolo