Re: [PATCH bpf v2] bpf, sockmap: fix use-after-free when the stream parser resizes the skb
From: Bobby Eshleman
Date: Wed Jun 17 2026 - 18:36:29 EST
On Fri, Jun 12, 2026 at 12:35:51PM +0000, Sechang Lim wrote:
> sk_psock_strp_parse() runs the BPF_PROG_TYPE_SK_SKB stream-parser program
> to find the length of the next message. strparser assembles a message out
> of several received skbs by chaining them onto the head's frag_list and
> recording where to append the next one in strp->skb_nextp:
>
> *strp->skb_nextp = skb;
> strp->skb_nextp = &skb->next;
>
> and then calls the parser on the head:
>
> len = (*strp->cb.parse_msg)(strp, head);
>
> The parser is only meant to inspect the skb, but the program may call
> bpf_skb_change_tail() -- or the sibling bpf_skb_pull_data(),
> bpf_skb_change_head(), bpf_skb_adjust_room(), all allowed for SK_SKB.
> Once the head carries a frag_list these go
>
> ... -> skb_ensure_writable -> pskb_may_pull -> __pskb_pull_tail
>
> and __pskb_pull_tail() frees the frag_list skbs that strparser still
> tracks through skb_nextp:
>
> while ((list = skb_shinfo(skb)->frag_list) != insp) {
> skb_shinfo(skb)->frag_list = list->next;
> consume_skb(list);
> }
>
> strp->skb_nextp now points into a freed sk_buff. The next segment of
> the same message arrives in __strp_recv(), which links it with
> *strp->skb_nextp = skb, an 8-byte write into the freed skb. The free
> and the write happen in different __strp_recv() calls, so the message
> has to span at least three segments before it triggers.
>
> BUG: KASAN: slab-use-after-free in __strp_recv+0x447/0xda0
> Write of size 8 at addr ffff88810db86140 by task repro/349
>
> Call Trace:
> <IRQ>
> __strp_recv+0x447/0xda0
> __tcp_read_sock+0x13d/0x590
> tcp_bpf_strp_read_sock+0x195/0x320
> strp_data_ready+0x267/0x340
> sk_psock_strp_data_ready+0x1ce/0x350
> tcp_data_queue+0x1364/0x2fd0
> tcp_rcv_established+0xe07/0x1640
> [...]
>
> Allocated by task 349:
> skb_clone+0x17b/0x210
> __strp_recv+0x2c3/0xda0
> __tcp_read_sock+0x13d/0x590
> [...]
>
> Freed by task 349:
> kmem_cache_free+0x150/0x570
> __pskb_pull_tail+0x57b/0xc20
> skb_ensure_writable+0x236/0x260
> __bpf_skb_change_tail+0x1d4/0x590
> sk_skb_change_tail+0x2a/0x40
> bpf_prog_1b285dcd6c41373e+0x27/0x30
> bpf_prog_run_pin_on_cpu+0xf3/0x260
> sk_psock_strp_parse+0x118/0x1e0
> __strp_recv+0x4f6/0xda0
> [...]
>
> The same resize also leaves the head's length inconsistent with its
> frags, so a later __pskb_pull_tail() can instead hit the
> BUG_ON(skb_copy_bits(...)) in net/core/skbuff.c.
>
> Run the parser on a private clone of the head when the message spans more
> than one skb and the program can modify the packet
> (prog->aux->changes_pkt_data), so a resizing helper can only touch the
> clone and strparser's head and skb_nextp stay valid. Single-skb messages
> have no frag_list and read-only parsers cannot resize, so both are still
> parsed in place. If the clone cannot be allocated, return 0 so the caller
> retries on the next read rather than failing the parser.
>
> Fixes: 8a31db561566 ("bpf: add access to sock fields and pkt data from sk_skb programs")
> Signed-off-by: Sechang Lim <rhkrqnwk98@xxxxxxxxx>
> ---
> v2:
> - clone only when prog->aux->changes_pkt_data (Bobby Eshleman)
> - return 0 on clone failure instead of -ENOMEM (Bobby Eshleman)
> - free the clone with consume_skb() instead of kfree_skb()
> - drop the unrelated guard(rcu)() change (Bobby Eshleman)
>
> v1:
> - https://lore.kernel.org/all/20260609112316.3685738-1-rhkrqnwk98@xxxxxxxxx/
>
> net/core/skmsg.c | 26 +++++++++++++++++++++++---
> 1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index e1850caf1a71..97e5bc5f38c3 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1149,9 +1149,29 @@ static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb)
> rcu_read_lock();
> prog = READ_ONCE(psock->progs.stream_parser);
> if (likely(prog)) {
> - skb->sk = psock->sk;
> - ret = bpf_prog_run_pin_on_cpu(prog, skb);
> - skb->sk = NULL;
> + struct sk_buff *parse_skb = skb;
> +
> + /*
> + * strparser chains the message skbs through skb->frag_list and
> + * keeps a pointer into that list in strp->skb_nextp. The parser
> + * program may call bpf_skb_change_tail() and friends, which go
> + * through __pskb_pull_tail() and free the frag_list skbs that
> + * strparser still tracks. Run the program on a clone when the head
> + * has a frag_list and the program can modify the packet, so it
> + * cannot drop frags strparser owns.
> + */
> + if (skb_has_frag_list(skb) && prog->aux->changes_pkt_data) {
> + parse_skb = skb_clone(skb, GFP_ATOMIC);
> + if (!parse_skb) {
> + rcu_read_unlock();
> + return 0;
> + }
> + }
> + parse_skb->sk = psock->sk;
> + ret = bpf_prog_run_pin_on_cpu(prog, parse_skb);
> + parse_skb->sk = NULL;
> + if (parse_skb != skb)
> + consume_skb(parse_skb);
> }
> rcu_read_unlock();
> return ret;
> --
> 2.43.0
>
Hey Sechang,
I'm still on the fence about "return 0" vs ENOMEM. I hate to flip-flop
on you here, but now I'm not sure if it is worth the complication to
return 0 since we're really only buying a single timer interval in which
we need 1) suddenly more memory to alloc the clone, and 2) another data
ready event to cause the stream parsing to pick up again. If any one
doesn't happen, the end result is the same. Not sure its a good
trade-off for the complexity of basically tricking the caller with the
zero return. Maybe let's go back to ENOMEM?
BTW, based on the comm name "repro", it sounds like you have a decent
reproducer for this. I wonder if it is possible to add something to the
selftests to catch this?
Best,
Bobby