Re: [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks
From: KaFai Wan
Date: Fri Apr 17 2026 - 05:38:45 EST
On Fri, 2026-04-17 at 10:43 +0800, Jiayuan Chen wrote:
>
> On 4/16/26 7:23 PM, KaFai Wan wrote:
> > A BPF_SOCK_OPS program can enable
> > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
> > bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or
> > BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
> >
> > In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach
> > __tcp_sock_set_nodelay(), which can call tcp_push_pending_frames().
> >
> > From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call
> > tcp_current_mss(), which calls tcp_established_options() and re-enters
> > bpf_skops_hdr_opt_len().
> >
> > BPF_SOCK_OPS_HDR_OPT_LEN_CB
> > -> bpf_setsockopt(TCP_NODELAY)
> > -> tcp_push_pending_frames()
> > -> tcp_current_mss()
> > -> tcp_established_options()
> > -> bpf_skops_hdr_opt_len()
> > -> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> >
> > From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call
> > tcp_write_xmit(), which calls tcp_transmit_skb(). That path recomputes
> > header option length through tcp_established_options() and
> > bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt().
> >
> > BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> > -> bpf_setsockopt(TCP_NODELAY)
> > -> tcp_push_pending_frames()
> > -> tcp_write_xmit()
> > -> tcp_transmit_skb()
> > -> tcp_established_options()
> > -> bpf_skops_hdr_opt_len()
> > -> bpf_skops_write_hdr_opt()
> > -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> >
> > This leads to unbounded recursion and can overflow the kernel stack.
> >
> > Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt()
> > when bpf_setsockopt() is called from
> > BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
> >
> > Reported-by: Quan Sun <2022090917019@xxxxxxxxxxxxxxxx>
> > Reported-by: Yinhao Hu <dddddd@xxxxxxxxxxx>
> > Reported-by: Kaiyan Mei <M202472210@xxxxxxxxxxx>
> > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@xxxxxxxxxxxxxxxx/
> > Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
> > Signed-off-by: KaFai Wan <kafai.wan@xxxxxxxxx>
> > ---
> > net/core/filter.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index fcfcb72663ca..911ff04bca5a 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
> > if (!is_locked_tcp_sock_ops(bpf_sock))
> > return -EOPNOTSUPP;
> >
> > + if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
> > + bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
> > + IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY)
> > + return -EOPNOTSUPP;
> > +
> > return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
> > }
> >
>
> A simple comment is recommended:
>
> /* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these
> callbacks. */
>
ok. Added in new version.
>
> Also like Martin pointed before, BPF_SOCK_OPS_HDR_OPT_LEN_CB /
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
>
> can only be produced under CONFIG_INET so IS_ENABLED(CONFIG_INET) is dead.
>
--
Thanks,
KaFai