Re: [PATCH bpf] bpf, sockmap: fix use-after-free when the stream parser resizes the skb

From: Bobby Eshleman

Date: Tue Jun 09 2026 - 14:27:46 EST


On Tue, Jun 09, 2026 at 11:23:03AM +0000, Sechang Lim wrote:
> sk_psock_strp_parse() runs the BPF_PROG_TYPE_SK_SKB stream-parser program to
> find the length of the next message. strparser assembles a message out of
> several received skbs by chaining them onto the head's frag_list and recording
> where to append the next one in strp->skb_nextp:
>
> *strp->skb_nextp = skb;
> strp->skb_nextp = &skb->next;
>
> and then calls the parser on the head:
>
> len = (*strp->cb.parse_msg)(strp, head);
>
> The parser is only meant to inspect the skb, but the program may call
> bpf_skb_change_tail() -- or the sibling bpf_skb_pull_data(),
> bpf_skb_change_head(), bpf_skb_adjust_room(), all allowed for SK_SKB. Once the
> head carries a frag_list these go
>
> ... -> skb_ensure_writable -> pskb_may_pull -> __pskb_pull_tail
>
> and __pskb_pull_tail() frees the frag_list skbs that strparser still tracks
> through skb_nextp:
>
> while ((list = skb_shinfo(skb)->frag_list) != insp) {
> skb_shinfo(skb)->frag_list = list->next;
> consume_skb(list);
> }
>
> strp->skb_nextp now points into a freed sk_buff. The next segment of the same
> message arrives in __strp_recv(), which links it with *strp->skb_nextp = skb,
> an 8-byte write into the freed skb. The free and the write happen in different
> __strp_recv() calls, so the message has to span at least three segments before
> it triggers.
>
> BUG: KASAN: slab-use-after-free in __strp_recv+0x447/0xda0
> Write of size 8 at addr ffff88810db86140 by task repro/349
>
> Call Trace:
> <IRQ>
> __strp_recv+0x447/0xda0
> __tcp_read_sock+0x13d/0x590
> tcp_bpf_strp_read_sock+0x195/0x320
> strp_data_ready+0x267/0x340
> sk_psock_strp_data_ready+0x1ce/0x350
> tcp_data_queue+0x1364/0x2fd0
> tcp_rcv_established+0xe07/0x1640
> [...]
>
> Allocated by task 349:
> skb_clone+0x17b/0x210
> __strp_recv+0x2c3/0xda0
> __tcp_read_sock+0x13d/0x590
> [...]
>
> Freed by task 349:
> kmem_cache_free+0x150/0x570
> __pskb_pull_tail+0x57b/0xc20
> skb_ensure_writable+0x236/0x260
> __bpf_skb_change_tail+0x1d4/0x590
> sk_skb_change_tail+0x2a/0x40
> bpf_prog_1b285dcd6c41373e+0x27/0x30
> bpf_prog_run_pin_on_cpu+0xf3/0x260
> sk_psock_strp_parse+0x118/0x1e0
> __strp_recv+0x4f6/0xda0
> [...]
>
> The same resize also leaves the head's length inconsistent with its frags, so
> a later __pskb_pull_tail() can instead hit the BUG_ON(skb_copy_bits(...)) in
> net/core/skbuff.c.
>
> Run the parser on a private clone of the head whenever the message spans more
> than one skb, so a resizing helper can only touch the clone and strparser's
> head and skb_nextp stay valid. Single-skb messages have no frag_list and are
> still parsed in place.
>
> Fixes: 8a31db561566 ("bpf: add access to sock fields and pkt data from sk_skb programs")
> Signed-off-by: Sechang Lim <rhkrqnwk98@xxxxxxxxx>
> ---
> net/core/skmsg.c | 26 +++++++++++++++++++++-----
> 1 file changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index e1850caf1a71..d5b10f9b0ba8 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1146,14 +1146,30 @@ static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb)
> struct bpf_prog *prog;
> int ret = skb->len;
>
> - rcu_read_lock();
> + guard(rcu)();

I'd probably not mix this rcu change into this patch, though I
understand it drops an unlock() call in the new err return.

> prog = READ_ONCE(psock->progs.stream_parser);
> if (likely(prog)) {
> - skb->sk = psock->sk;
> - ret = bpf_prog_run_pin_on_cpu(prog, skb);
> - skb->sk = NULL;
> + struct sk_buff *parse_skb = skb;
> +
> + /*
> + * strparser chains the message skbs through skb->frag_list and
> + * keeps a pointer into that list in strp->skb_nextp. The parser
> + * program may call bpf_skb_change_tail() and friends, which go
> + * through __pskb_pull_tail() and free the frag_list skbs that
> + * strparser still tracks. Run the program on a clone when a
> + * frag_list is present so it cannot drop frags strparser owns.
> + */
> + if (skb_has_frag_list(skb)) {
> + parse_skb = skb_clone(skb, GFP_ATOMIC);
> + if (!parse_skb)
> + return -ENOMEM;

I wonder if this could be further gated by prog->aux->changes_pkt_data?

One possible issue with adding an allocation here and returning -ENOMEM
is that this will shutdown stream parsing permanently for this sk when
under memory pressure, which is new behavior. Maybe, if we return 0, it
looks like the caller's accounting stays true by taking the "needs more
header" path, and then we can try again from the same place on the next
data_ready, and so be resilient even under memory pressure? I guess the
trade-off being that their may not come another data_ready to cause
progress... not sure which is better. Maybe strp_start_timer could be
used here too?

Best,
Bobby