Re: [PATCH bpf v2] bpf, sockmap: fix use-after-free when the stream parser resizes the skb
From: Sechang Lim
Date: Thu Jun 18 2026 - 01:59:16 EST
On Wed, Jun 17, 2026 at 03:36:07PM -0700, Bobby Eshleman wrote:
On Fri, Jun 12, 2026 at 12:35:51PM +0000, Sechang Lim wrote:
sk_psock_strp_parse() runs the BPF_PROG_TYPE_SK_SKB stream-parser programI'm still on the fence about "return 0" vs ENOMEM. I hate to flip-flop
to find the length of the next message. strparser assembles a message out
of several received skbs by chaining them onto the head's frag_list and
recording where to append the next one in strp->skb_nextp:
*strp->skb_nextp = skb;
strp->skb_nextp = &skb->next;
and then calls the parser on the head:
len = (*strp->cb.parse_msg)(strp, head);
The parser is only meant to inspect the skb, but the program may call
bpf_skb_change_tail() -- or the sibling bpf_skb_pull_data(),
bpf_skb_change_head(), bpf_skb_adjust_room(), all allowed for SK_SKB.
Once the head carries a frag_list these go
... -> skb_ensure_writable -> pskb_may_pull -> __pskb_pull_tail
and __pskb_pull_tail() frees the frag_list skbs that strparser still
tracks through skb_nextp:
while ((list = skb_shinfo(skb)->frag_list) != insp) {
skb_shinfo(skb)->frag_list = list->next;
consume_skb(list);
}
strp->skb_nextp now points into a freed sk_buff. The next segment of
the same message arrives in __strp_recv(), which links it with
*strp->skb_nextp = skb, an 8-byte write into the freed skb. The free
and the write happen in different __strp_recv() calls, so the message
has to span at least three segments before it triggers.
BUG: KASAN: slab-use-after-free in __strp_recv+0x447/0xda0
Write of size 8 at addr ffff88810db86140 by task repro/349
Call Trace:
<IRQ>
__strp_recv+0x447/0xda0
__tcp_read_sock+0x13d/0x590
tcp_bpf_strp_read_sock+0x195/0x320
strp_data_ready+0x267/0x340
sk_psock_strp_data_ready+0x1ce/0x350
tcp_data_queue+0x1364/0x2fd0
tcp_rcv_established+0xe07/0x1640
[...]
Allocated by task 349:
skb_clone+0x17b/0x210
__strp_recv+0x2c3/0xda0
__tcp_read_sock+0x13d/0x590
[...]
Freed by task 349:
kmem_cache_free+0x150/0x570
__pskb_pull_tail+0x57b/0xc20
skb_ensure_writable+0x236/0x260
__bpf_skb_change_tail+0x1d4/0x590
sk_skb_change_tail+0x2a/0x40
bpf_prog_1b285dcd6c41373e+0x27/0x30
bpf_prog_run_pin_on_cpu+0xf3/0x260
sk_psock_strp_parse+0x118/0x1e0
__strp_recv+0x4f6/0xda0
[...]
The same resize also leaves the head's length inconsistent with its
frags, so a later __pskb_pull_tail() can instead hit the
BUG_ON(skb_copy_bits(...)) in net/core/skbuff.c.
Run the parser on a private clone of the head when the message spans more
than one skb and the program can modify the packet
(prog->aux->changes_pkt_data), so a resizing helper can only touch the
clone and strparser's head and skb_nextp stay valid. Single-skb messages
have no frag_list and read-only parsers cannot resize, so both are still
parsed in place. If the clone cannot be allocated, return 0 so the caller
retries on the next read rather than failing the parser.
Fixes: 8a31db561566 ("bpf: add access to sock fields and pkt data from sk_skb programs")
Signed-off-by: Sechang Lim <rhkrqnwk98@xxxxxxxxx>
---
v2:
- clone only when prog->aux->changes_pkt_data (Bobby Eshleman)
- return 0 on clone failure instead of -ENOMEM (Bobby Eshleman)
- free the clone with consume_skb() instead of kfree_skb()
- drop the unrelated guard(rcu)() change (Bobby Eshleman)
v1:
- https://lore.kernel.org/all/20260609112316.3685738-1-rhkrqnwk98@xxxxxxxxx/
net/core/skmsg.c | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index e1850caf1a71..97e5bc5f38c3 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1149,9 +1149,29 @@ static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb)
rcu_read_lock();
prog = READ_ONCE(psock->progs.stream_parser);
if (likely(prog)) {
- skb->sk = psock->sk;
- ret = bpf_prog_run_pin_on_cpu(prog, skb);
- skb->sk = NULL;
+ struct sk_buff *parse_skb = skb;
+
+ /*
+ * strparser chains the message skbs through skb->frag_list and
+ * keeps a pointer into that list in strp->skb_nextp. The parser
+ * program may call bpf_skb_change_tail() and friends, which go
+ * through __pskb_pull_tail() and free the frag_list skbs that
+ * strparser still tracks. Run the program on a clone when the head
+ * has a frag_list and the program can modify the packet, so it
+ * cannot drop frags strparser owns.
+ */
+ if (skb_has_frag_list(skb) && prog->aux->changes_pkt_data) {
+ parse_skb = skb_clone(skb, GFP_ATOMIC);
+ if (!parse_skb) {
+ rcu_read_unlock();
+ return 0;
+ }
+ }
+ parse_skb->sk = psock->sk;
+ ret = bpf_prog_run_pin_on_cpu(prog, parse_skb);
+ parse_skb->sk = NULL;
+ if (parse_skb != skb)
+ consume_skb(parse_skb);
}
rcu_read_unlock();
return ret;
--
2.43.0
on you here, but now I'm not sure if it is worth the complication to
return 0 since we're really only buying a single timer interval in which
we need 1) suddenly more memory to alloc the clone, and 2) another data
ready event to cause the stream parsing to pick up again. If any one
doesn't happen, the end result is the same. Not sure its a good
trade-off for the complexity of basically tricking the caller with the
zero return. Maybe let's go back to ENOMEM?
Per Kuniyuki's and Jiayuan's suggestion, v3 will reject a packet-modifying
stream parser at attach time instead of runtime, so the return-0 vs
ENOMEM question goes away with that code.
BTW, based on the comm name "repro", it sounds like you have a decent
reproducer for this. I wonder if it is possible to add something to the
selftests to catch this?
I will add an selftest in v3.
Best,
Sechang